Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

IRLAS: Inverse Reinforcement Learning for Architecture Search

2019-03-11

Minghao Guo, Zhao Zhong, Wei Wu, Dahua Lin, Junjie Yan

arXiv_CV

arXiv_CV Knowledge GAN NAS Reinforcement_Learning Inference
Abstract

In this paper, we propose an inverse reinforcement learning method for architecture search (IRLAS), which trains an agent to learn to search network structures that are topologically inspired by human-designed network. Most existing architecture search approaches totally neglect the topological characteristics of architectures, which results in complicated architecture with a high inference latency. Motivated by the fact that human-designed networks are elegant in topology with a fast inference speed, we propose a mirror stimuli function inspired by biological cognition theory to extract the abstract topological knowledge of an expert human-design network (ResNeXt). To avoid raising a too strong prior over the search space, we introduce inverse reinforcement learning to train the mirror stimuli function and exploit it as a heuristic guidance for architecture search, easily generalized to different architecture search algorithms. On CIFAR-10, the best architecture searched by our proposed IRLAS achieves 2.60% error rate. For ImageNet mobile setting, our model achieves a state-of-the-art top-1 accuracy 75.28%, while being 2~4x faster than most auto-generated architectures. A fast version of this model achieves 10% faster than MobileNetV2, while maintaining a higher accuracy.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.05285

PDF

https://arxiv.org/pdf/1812.05285
Read All
Manipulation by Feel: Touch-Based Control with Deep Predictive Models

2019-03-11

Stephen Tian, Frederik Ebert, Dinesh Jayaraman, Mayur Mudigonda, Chelsea Finn, Roberto Calandra, Sergey Levine

arXiv_CV

arXiv_CV Knowledge
Abstract

Touch sensing is widely acknowledged to be important for dexterous robotic manipulation, but exploiting tactile sensing for continuous, non-prehensile manipulation is challenging. General purpose control techniques that are able to effectively leverage tactile sensing as well as accurate physics models of contacts and forces remain largely elusive, and it is unclear how to even specify a desired behavior in terms of tactile percepts. In this paper, we take a step towards addressing these issues by combining high-resolution tactile sensing with data-driven modeling using deep neural network dynamics models. We propose deep tactile MPC, a framework for learning to perform tactile servoing from raw tactile sensor inputs, without manual supervision. We show that this method enables a robot equipped with a GelSight-style tactile sensor to manipulate a ball, analog stick, and 20-sided die, learning from unsupervised autonomous interaction and then using the learned tactile predictive model to reposition each object to user-specified configurations, indicated by a goal tactile reading. Videos, visualizations and the code are available here: this https URL

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04128

PDF

https://arxiv.org/pdf/1903.04128
Read All
Singing voice conversion with non-parallel data

2019-03-11

Xin Chen, Wei Chu, Jinxi Guo, Ning Xu

arXiv_SD

arXiv_SD Knowledge Speech_Recognition RNN Recognition
Abstract

Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic posterior feature is first generated by decoding singing voices through a robust Automatic Speech Recognition Engine (ASR). Then, a trained Recurrent Neural Network (RNN) with a Deep Bidirectional Long Short Term Memory (DBLSTM) structure is used to model the mapping from person-independent content to the acoustic features of the target person. F0 and aperiodic are obtained through the original singing voice, and used with acoustic features to reconstruct the target singing voice through a vocoder. In the obtained singing voice, the targeted and sourced singers sound similar. To our knowledge, this is the first study that uses non parallel data to train a singing voice conversion system. Subjective evaluations demonstrate that the proposed method effectively converts singing voices.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04124

PDF

https://arxiv.org/pdf/1903.04124
Read All
HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs

2019-03-11

Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri

arXiv_CV

arXiv_CV CNN Deep_Learning
Abstract

We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we present extensive experimental results on the standard convolutional neural network (CNN) architectures such as VGG \cite{vgg2014very} and ResNet \cite{resnet}. We find that after replacing the standard convolutional filters in these architectures with our proposed HetConv filters, we achieve 3X to 8X FLOPs based improvement in speed while still maintaining (and sometimes improving) the accuracy. We also compare our proposed convolutions with group/depth wise convolutions and show that it achieves more FLOPs reduction with significantly higher accuracy.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04120

PDF

https://arxiv.org/pdf/1903.04120
Read All
EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction

2019-03-11

Sen Jia, Neil D. B. Bruce

arXiv_CV

arXiv_CV Salient Knowledge Inference Prediction
Abstract

Saliency prediction can benefit from training that involves scene understanding that may be tangential to the central task; this may include understanding places, spatial layout, objects or involve different datasets and their bias. One can combine models, but to do this in a sophisticated manner can be complex, and also result in unwieldy networks or produce competing objectives that are hard to balance. In this paper, we propose a scalable system to leverage multiple powerful deep CNN models to better extract visual features for saliency prediction. Our design differs from previous studies in that the whole system is trained in an almost end-to-end piece-wise fashion. The encoder and decoder components are separately trained to deal with complexity tied to the computational paradigm and required space. Furthermore, the encoder can contain more than one CNN model to extract features, and models can have different architectures or be pre-trained on different datasets. This parallel design yields a better computational paradigm overcoming limits to the variety of information or inference that can be combined at the encoder stage towards deeper networks and a more powerful encoding. Our network can be easily expanded almost without any additional cost, and other pre-trained CNN models can be incorporated availing a wider range of visual knowledge. We denote our expandable multi-layer network as EML-NET and our method achieves the state-of-the-art results on the public saliency benchmarks, SALICON, MIT300 and CAT2000.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.01047

PDF

http://arxiv.org/pdf/1805.01047
Read All
Hybrid Reinforcement Learning with Expert State Sequences

2019-03-11

Xiaoxiao Guo, Shiyu Chang, Mo Yu, Gerald Tesauro, Murray Campbell

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization Inference
Abstract

Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04110

PDF

http://arxiv.org/pdf/1903.04110
Read All
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

2019-03-11

Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang

arXiv_AI

arXiv_AI Deep_Learning
Abstract

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/lancopku/meProp

Abstract (translated by Google)

URL

http://arxiv.org/abs/1706.06197

PDF

http://arxiv.org/pdf/1706.06197
Read All
Alignment Based Matching Networks for One-Shot Classification and Open-Set Recognition

2019-03-11

Paresh Malalur, Tommi Jaakkola

arXiv_CV

arXiv_CV Attention CNN Classification Deep_Learning Recognition
Abstract

Deep learning for object classification relies heavily on convolutional models. While effective, CNNs are rarely interpretable after the fact. An attention mechanism can be used to highlight the area of the image that the model focuses on thus offering a narrow view into the mechanism of classification. We expand on this idea by forcing the method to explicitly align images to be classified to reference images representing the classes. The mechanism of alignment is learned and therefore does not require that the reference objects are anything like those being classified. Beyond explanation, our exemplar based cross-alignment method enables classification with only a single example per category (one-shot). Our model cuts the 5-way, 1-shot error rate in Omniglot from 2.1% to 1.4% and in MiniImageNet from 53.5% to 46.5% while simultaneously providing point-wise alignment information providing some understanding on what the network is capturing. This method of alignment also enables the recognition of an unsupported class (open-set) in the one-shot setting while maintaining an F1-score of above 0.5 for Omniglot even with 19 other distracting classes while baselines completely fail to separate the open-set class in the one-shot setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06538

PDF

http://arxiv.org/pdf/1903.06538
Read All
Spatial-Aware Non-Local Attention for Fashion Landmark Detection

2019-03-11

Yixin Li, Shengqin Tang, Yun Ye, Jinwen Ma

arXiv_CV

arXiv_CV Attention Image_Classification Classification Deep_Learning Detection
Abstract

Fashion landmark detection is a challenging task even using the current deep learning techniques, due to the large variation and non-rigid deformation of clothes. In order to tackle these problems, we propose Spatial-Aware Non-Local (SANL) block, an attentive module in deep neural network which can utilize spatial information while capturing global dependency. Actually, the SANL block is constructed from the non-local block in the residual manner which can learn the spatial related representation by taking a spatial attention map from Grad-CAM. We then establish our fashion landmark detection framework on feature pyramid network, equipped with four SANL blocks in the backbone. It is demonstrated by the experimental results on two large-scale fashion datasets that our proposed fashion landmark detection approach with the SANL blocks outperforms the current state-of-the-art methods considerably. Some supplementary experiments on fine-grained image classification also show the effectiveness of the proposed SANL block.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04104

PDF

https://arxiv.org/pdf/1903.04104
Read All
Blameworthiness in Multi-Agent Settings

2019-03-11

Meir Friedenberg, Joseph Y. Halpern

arXiv_AI

arXiv_AI
Abstract

We provide a formal definition of blameworthiness in settings where multiple agents can collaborate to avoid a negative outcome. We first provide a method for ascribing blameworthiness to groups relative to an epistemic state (a distribution over causal models that describe how the outcome might arise). We then show how we can go from an ascription of blameworthiness for groups to an ascription of blameworthiness for individuals using a standard notion from cooperative game theory, the Shapley value. We believe that getting a good notion of blameworthiness in a group setting will be critical for designing autonomous agents that behave in a moral manner.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04102

PDF

http://arxiv.org/pdf/1903.04102
Read All
Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games

2019-03-11

Chun Kai Ling, Fei Fang, J. Zico Kolter

arXiv_AI

arXiv_AI Face
Abstract

With the recent advances in solving large, zero-sum extensive form games, there is a growing interest in the inverse problem of inferring underlying game parameters given only access to agent actions. Although a recent work provides a powerful differentiable end-to-end learning frameworks which embed a game solver within a deep-learning framework, allowing unknown game parameters to be learned via backpropagation, this framework faces significant limitations when applied to boundedly rational human agents and large scale problems, leading to poor practicality. In this paper, we address these limitations and propose a framework that is applicable for more practical settings. First, seeking to learn the rationality of human agents in complex two-player zero-sum games, we draw upon well-known ideas in decision theory to obtain a concise and interpretable agent behavior model, and derive solvers and gradients for end-to-end learning. Second, to scale up to large, real-world scenarios, we propose an efficient first-order primal-dual method which exploits the structure of extensive-form games, yielding significantly faster computation for both game solving and gradient computation. When tested on randomly generated games, we report speedups of orders of magnitude over previous approaches. We also demonstrate the effectiveness of our model on both real-world one-player settings and synthetic data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04101

PDF

http://arxiv.org/pdf/1903.04101
Read All
MTRNet: A Generic Scene Text Eraser

2019-03-11

Osman Tursun, Rui Zeng, Simon Denman, Sabesan Sivipalan, Sridha Sridharan, Clinton Fookes

arXiv_CV

arXiv_CV Adversarial Knowledge GAN Detection
Abstract

Text removal algorithms have been proposed for uni-lingual scripts with regular shapes and layouts. However, to the best of our knowledge, a generic text removal method which is able to remove all or user-specified text regions regardless of font, script, language or shape is not available. Developing such a generic text eraser for real scenes is a challenging task, since it inherits all the challenges of multi-lingual and curved text detection and inpainting. To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet is a conditional adversarial generative network (cGAN) with an auxiliary mask. The introduced auxiliary mask not only makes the cGAN a generic text eraser, but also enables stable training and early convergence on a challenging large-scale synthetic dataset, initially proposed for text detection in real scenes. What’s more, MTRNet achieves state-of-the-art results on several real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without being explicitly trained on this data, outperforming previous state-of-the-art methods trained directly on these datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04092

PDF

https://arxiv.org/pdf/1903.04092
Read All
A Hybrid Framework for Action Recognition in Low-Quality Video Sequences

2019-03-11

Tej Singh, Dinesh Kumar Vishwakarma

arXiv_CV

arXiv_CV Action_Recognition Classification Recognition
Abstract

Vision-based activity recognition is essential for security, monitoring and surveillance applications. Further, real-time analysis having low-quality video and contain less information about surrounding due to poor illumination, and occlusions. Therefore, it needs a more robust and integrated model for low quality and night security operations. In this context, we proposed a hybrid model for illumination invariant human activity recognition based on sub-image histogram equalization enhancement and k-key pose human silhouettes. This feature vector gives good average recognition accuracy on three low exposure video sequences subset of original actions video datasets. Finally, the performance of the proposed approach is tested over three manually downgraded low qualities Weizmann action, KTH, and Ballet Movement dataset. This model outperformed on low exposure videos over existing technique and achieved comparable classification accuracy to similar state-of-the-art methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04090

PDF

https://arxiv.org/pdf/1903.04090
Read All
Exploring OpenStreetMap Availability for Driving Environment Understanding

2019-03-11

Yang Zheng, Izzat H. Izzat, John H.L. Hansen

arXiv_AI

arXiv_AI Knowledge Segmentation CNN Semantic_Segmentation RNN Deep_Learning Recognition
Abstract

With the great achievement of artificial intelligence, vehicle technologies have advanced significantly from human centric driving towards fully automated driving. An intelligent vehicle should be able to understand the driver’s perception of the environment as well as controlling behavior of the vehicle. Since high digital map information has been available to provide rich environmental context about static roads, buildings and traffic infrastructures, it would be worthwhile to explore map data capability for driving task understanding. Alternative to commercial used maps, the OpenStreetMap (OSM) data is a free open dataset, which makes it unique for the exploration research. This study is focused on two tasks that leverage OSM for driving environment understanding. First, driving scenario attributes are retrieved from OSM elements, which are combined with vehicle dynamic signals for the driving event recognition. Utilizing steering angle changes and based on a Bi-directional Recurrent Neural Network (Bi-RNN), a driving sequence is segmented and classified as lane-keeping, lane-change-left, lane-change-right, turn-left, and turn-right events. Second, for autonomous driving perception, OSM data can be used to render virtual street views, represented as prior knowledge to fuse with vision/laser systems for road semantic segmentation. Five different types of road masks are generated from OSM, images, and Lidar points, and fused to characterize the drivable space at the driver’s perspective. An alternative data-driven approach is based on a Fully Convolutional Network (FCN), OSM availability for deep learning methods are discussed to reveal potential usage on compensating street view images and automatic road semantic annotation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04084

PDF

http://arxiv.org/pdf/1903.04084
Read All
Redditors in Recovery: Text Mining Reddit to Investigate Transitions into Drug Addiction

2019-03-11

John Lu, Sumati Sridhar, Ritika Pandey, Mohammad Al Hasan, George Mohler

arXiv_CL

arXiv_CL
Abstract

Increasing rates of opioid drug abuse and heightened prevalence of online support communities underscore the necessity of employing data mining techniques to better understand drug addiction using these rapidly developing online resources. In this work, we obtain data from Reddit, an online collection of forums, to gather insight into drug use/misuse using text data from users themselves. Specifically, using user posts, we trained 1) a binary classifier which predicts transitions from casual drug discussion forums to drug recovery forums and 2) a Cox regression model that outputs likelihoods of such transitions. In doing so, we found that utterances of select drugs and certain linguistic features contained in one’s posts can help predict these transitions. Using unfiltered drug-related posts, our research delineates drugs that are associated with higher rates of transitions from recreational drug discussion to support/recovery discussion, offers insight into modern drug culture, and provides tools with potential applications in combating the opioid crisis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04081

PDF

http://arxiv.org/pdf/1903.04081
Read All
Evolving Deep Convolutional Neural Networks for Image Classification

2019-03-10

Yanan Sun, Bing Xue, Mengjie Zhang, Gary G. Yen

arXiv_CV

arXiv_CV CNN Image_Classification Optimization Classification
Abstract

Evolutionary computation methods have been successfully applied to neural networks since two decades ago, while those methods cannot scale well to the modern deep neural networks due to the complicated architectures and large quantities of connection weights. In this paper, we propose a new method using genetic algorithms for evolving the architectures and connection weight initialization values of a deep convolutional neural network to address image classification problems. In the proposed algorithm, an efficient variable-length gene encoding strategy is designed to represent the different building blocks and the unpredictable optimal depth in convolutional neural networks. In addition, a new representation scheme is developed for effectively initializing connection weights of deep convolutional neural networks, which is expected to avoid networks getting stuck into local minima which is typically a major issue in the backward gradient-based optimization. Furthermore, a novel fitness evaluation method is proposed to speed up the heuristic search with substantially less computational resource. The proposed algorithm is examined and compared with 22 existing algorithms on nine widely used image classification tasks, including the state-of-the-art methods. The experimental results demonstrate the remarkable superiority of the proposed algorithm over the state-of-the-art algorithms in terms of classification error rate and the number of parameters (weights).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1710.10741

PDF

http://arxiv.org/pdf/1710.10741
Read All
Towards Weighted-Sampling Audio Adversarial Example Attack

2019-03-10

Xiaolei Liu, Kun Wan, Yufei Ding

arXiv_SD

arXiv_SD Adversarial Speech_Recognition Recognition
Abstract

Recent studies have highlighted audio adversarial examples as a ubiquitous threat to state-of-the-art automatic speech recognition systems. Nonetheless, the efficiency and robustness of existing works are not yet satisfactory due to the large search space of audio. In this paper, we introduce the first study of \textit{weighted-sampling audio adversarial examples}, specifically focusing on the factor of the numbers and the positions of distortion to reduce the search space. Meanwhile, we propose a new attack scenario, audio injection attack, which offers some novel insights in the concealment of adversarial attack. Our experimental study shows that we can generate audio adversarial examples with low noise and high robustness at the minute level, compared to other hour-level state-of-the-art methods. \footnote{We encourage you to listen to these audio adversarial examples on this anonymous website.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.10300

PDF

http://arxiv.org/pdf/1901.10300
Read All
Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation

2019-03-10

Chen-Yu Lee, Tanmay Batra, Mohammad Haris Baig, Daniel Ulbricht

arXiv_CV

arXiv_CV Object_Detection Segmentation Image_Classification Semantic_Segmentation Classification Detection Recognition
Abstract

In this work, we connect two distinct concepts for unsupervised domain adaptation: feature distribution alignment between domains by utilizing the task-specific decision boundary and the Wasserstein metric. Our proposed sliced Wasserstein discrepancy (SWD) is designed to capture the natural notion of dissimilarity between the outputs of task-specific classifiers. It provides a geometrically meaningful guidance to detect target samples that are far from the support of the source and enables efficient distribution alignment in an end-to-end trainable fashion. In the experiments, we validate the effectiveness and genericness of our method on digit and sign recognition, image classification, semantic segmentation, and object detection.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04064

PDF

https://arxiv.org/pdf/1903.04064
Read All
Affordance Learning for End-to-End Visuomotor Robot Control

2019-03-10

Aleksi Hämäläinen, Karol Arndt, Ali Ghadirzadeh, Ville Kyrki

arXiv_RO

arXiv_RO
Abstract

Training end-to-end deep robot policies requires a lot of domain-, task-, and hardware-specific data, which is often costly to provide. In this work, we propose to tackle this issue by employing a deep neural network with a modular architecture, consisting of separate perception, policy, and trajectory parts. Each part of the system is trained fully on synthetic data or in simulation. The data is exchanged between parts of the system as low-dimensional latent representations of affordances and trajectories. The performance is then evaluated in a zero-shot transfer scenario using Franka Panda robot arm. Results demonstrate that a low-dimensional representation of scene affordances extracted from an RGB image is sufficient to successfully train manipulator policies. We also introduce a method for affordance dataset generation, which is easily generalizable to new tasks, objects and environments, and requires no manual pixel labeling.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04053

PDF

https://arxiv.org/pdf/1903.04053
Read All
Dynamic Demand Prediction for Expanding Electric Vehicle Sharing Systems: A Graph Sequence Learning Approach

2019-03-10

Man Luo, Hongkai Wen, Yi Luo, Bowen Du, Konstantin Klemmer, Hongming Zhu

arXiv_AI

arXiv_AI CNN Prediction Relation
Abstract

Electric Vehicle (EV) sharing systems have recently experienced unprecedented growth across the globe. During their fast expansion, one fundamental determinant for success is the capability of dynamically predicting the demand of stations as the entire system is evolving continuously. There are several challenges in this dynamic demand prediction problem. Firstly, unlike most of the existing work which predicts demand only for static systems or at few stages of expansion, in the real world we often need to predict the demand as or even before stations are being deployed or closed, to provide information and support for decision making. Secondly, for the stations to be deployed, there is no historical record or additional mobility data available to help the prediction of their demand. Finally, the impact of deploying/closing stations to the remaining stations in the system can be very complex. To address these challenges, in this paper we propose a novel dynamic demand prediction approach based on graph sequence learning, which is able to model the dynamics during the system expansion and predict demand accordingly. We use a local temporal encoding process to handle the available historical data at individual stations, and a dynamic spatial encoding process to take correlations between stations into account with graph convolutional neural networks. The encoded features are fed to a multi-scale prediction network, which forecasts both the long-term expected demand of the stations and their instant demand in the near future. We evaluate the proposed approach on real-world data collected from a major EV sharing platform in Shanghai for one year. Experimental results demonstrate that our approach significantly outperforms the state of the art, showing up to three-fold performance gain in predicting demand for the rapidly expanding EV sharing system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04051

PDF

http://arxiv.org/pdf/1903.04051
Read All
Knowledge compilation languages as proof systems

2019-03-10

Florent Capelli

arXiv_AI

arXiv_AI Knowledge
Abstract

In this paper, we study proof systems in the sense of Cook-Reckhow for problems that are higher in the polynomial hierarchy than coNP, in particular, #SAT and maxSAT. We start by explaining how the notion of Cook-Reckhow proof systems can be apply to these problems and show how one can twist existing languages in knowledge compilation such as decision DNNF so that they can be seen as proof systems for problems such as #SAT and maxSAT.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04039

PDF

http://arxiv.org/pdf/1903.04039
Read All
Just-Enough Interaction Approach to Knee MRI Segmentation: Data from the Osteoarthritis Initiative

2019-03-10

Satyananda Kashyap, Honghai Zhang, Milan Sonka

arXiv_CV

arXiv_CV Segmentation Face
Abstract

State-of-the-art automated segmentation algorithms are not 100\% accurate especially when segmenting difficult to interpret datasets like those with severe osteoarthritis (OA). We present a novel interactive method called just-enough interaction (JEI), which adds a fast correction step to the automated layered optimal graph segmentation of multiple objects and surfaces (LOGISMOS). After LOGISMOS segmentation in knee MRI, the JEI user interaction does not modify boundary surfaces of the bones and cartilages directly. Local costs of underlying graph nodes are modified instead and the graph is re-optimized, providing globally optimal corrected results. Significant performance improvement ($p \ll 0.001$) was observed when comparing JEI-corrected results to the automated. The algorithm was extended from 3D JEI to longitudinal multi-3D (4D) JEI allowing simultaneous visualization and interaction of multiple-time points of the same patient.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04027

PDF

https://arxiv.org/pdf/1903.04027
Read All
Learning from Extrapolated Corrections

2019-03-10

Jason Y. Zhang, Anca D. Dragan

arXiv_RO

arXiv_RO
Abstract

Our goal is to enable robots to learn cost functions from user guidance. Often it is difficult or impossible for users to provide full demonstrations, so corrections have emerged as an easier guidance channel. However, when robots learn cost functions from corrections rather than demonstrations, they have to extrapolate a small amount of information – the change of a waypoint along the way – to the rest of the trajectory. We cast this extrapolation problem as online function approximation, which exposes different ways in which the robot can interpret what trajectory the person intended, depending on the function space used for the approximation. Our simulation results and user study suggest that using function spaces with non-Euclidean norms can better capture what users intend, particularly if environments are uncluttered. This, in turn, can lead to the robot learning a more accurate cost function and improves the user’s subjective perceptions of the robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.01225

PDF

http://arxiv.org/pdf/1812.01225
Read All
Group-wise Correlation Stereo Network

2019-03-10

Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li

arXiv_CV

arXiv_CV CNN Inference Relation
Abstract

Stereo matching estimates the disparity between a rectified image pair, which is of great importance to depth sensing, autonomous driving, and other related tasks. Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps. In this paper, we propose to construct the cost volume by group-wise correlation. The left features and the right features are divided into groups along the channel dimension, and correlation maps are computed among each group to obtain multiple matching cost proposals, which are then packed into a cost volume. Group-wise correlation provides efficient representations for measuring feature similarities and will not lose too much information like full correlation. It also preserves better performance when reducing parameters compared with previous methods. The 3D stacked hourglass network proposed in previous works is improved to boost the performance and decrease the inference computational cost. Experiment results show that our method outperforms previous methods on Scene Flow, KITTI 2012, and KITTI 2015 datasets. The code is available at this https URL

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04025

PDF

https://arxiv.org/pdf/1903.04025
Read All
Deep Reinforcement Learning of Volume-guided Progressive View Inpainting for 3D Point Scene Completion from a Single Depth Image

2019-03-10

Xiaoguang Han, Zhaoxuan Zhang, Dong Du, Mingdai Yang, Jingming Yu, Pan Pan, Xin Yang, Ligang Liu, Zixiang Xiong, Shuguang Cui

arXiv_CV

arXiv_CV Reinforcement_Learning Quantitative
Abstract

We present a deep reinforcement learning method of progressive view inpainting for 3D point scene completion under volume guidance, achieving high-quality scene reconstruction from only a single depth image with severe occlusion. Our approach is end-to-end, consisting of three modules: 3D scene volume reconstruction, 2D depth map inpainting, and multi-view selection for completion. Given a single depth image, our method first goes through the 3D volume branch to obtain a volumetric scene reconstruction as a guide to the next view inpainting step, which attempts to make up the missing information; the third step involves projecting the volume under the same view of the input, concatenating them to complete the current view depth, and integrating all depth into the point cloud. Since the occluded areas are unavailable, we resort to a deep Q-Network to glance around and pick the next best view for large hole completion progressively until a scene is adequately reconstructed while guaranteeing validity. All steps are learned jointly to achieve robust and consistent results. We perform qualitative and quantitative evaluations with extensive experiments on the SUNCG data, obtaining better results than the state of the art.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04019

PDF

https://arxiv.org/pdf/1903.04019
Read All
The Privacy Policy Landscape After the GDPR

2019-03-10

Thomas Linden, Rishabh Khandelwal, Hamza Harkous, Kassem Fawaz

arXiv_CL

arXiv_CL GAN
Abstract

Every new privacy regulation brings along the question of whether it results in improving the privacy for the users. The EU General Data Protection Regulation (GDPR) is one of the most demanding and comprehensive privacy regulations of all time. Hence, a few months after it went into effect, it is natural to study its impact on the landscape of privacy policies online. In this work, we conduct the first longitudinal, in-depth, and at-scale assessment of privacy policies before and after the GDPR. We gauge the complete consumption cycle of these policies, from the first user impressions until the compliance assessment. We create a diverse corpus of 3,686 English-language privacy policies for which we fetch the pre-GDPR and the post-GDPR versions. Our user study, with 460 participants on Amazon MTurk, does not indicate a significant change in the visual representation of privacy policies from the users’ perspective. We also find that the readability of privacy policies suffers under the GDPR, due to almost a 23% more sentences and words, despite the efforts to reduce the reliance on passive sentences. We further develop a new workflow for the automated assessment of requirements in privacy policies, building on automated natural language processing techniques. Using this workflow, we show that privacy policies cover more data practices, particularly around data retention, user choice, and specific audiences, and that an average of 16.5% of the policies improved across seven compliance metrics. Finally, we also assess how transparent the organizations are with their privacy practices by performing specificity analysis. In this analysis, we find evidence for positive changes triggered by the GDPR, with the specificity level, averaged over eight metrics, improving in over 19.4% of the policies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08396

PDF

http://arxiv.org/pdf/1809.08396
Read All
Contextualised concept embedding for efficiently adapting natural language processing models for phenotype identification

2019-03-10

Honghan Wu, Karen Hodgson, Susan Dyson, Katherine I. Morley, Zina M. Ibrahim, Ehtesham Iqbal, Robert Stewart, Richard JB Dobson, Cathie Sudlow

arXiv_CL

arXiv_CL Embedding
Abstract

Many efforts have been put to use automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records to picture comprehensive patient profiles for delivering better health-care. Reusing NLP models in new settings, however, remains cumbersome - requiring validation and/or retraining on new data iteratively to achieve convergent results. In this paper, we formally define and analyse the NLP model adaptation problem, particularly in phenotype identification tasks, and identify two types of common unnecessary or wasted efforts: duplicate waste and imbalance waste. A distributed representation approach is proposed to represent familiar language patterns for an NLP model by learning phenotype embeddings from its training data. Computations on these language patterns are then introduced to help avoid or reduce unnecessary efforts by combining both geometric and semantic similarities. To evaluate the approach, we cross validate NLP models developed for six physical morbidity studies (23 phenotypes; 17 million documents) on anonymised medical records of South London Maudsley NHS Trust, United Kingdom. Two metrics are introduced to quantify the reductions for both duplicate and imbalance wastes. We conducted various experiments on reusing NLP models in four phenotype identification tasks. Our approach can choose a best model for a given new task, which can identify up to 76% mentions needing no validation & model retraining, meanwhile, having very good performances (93-97% accuracy). It can also provide guidance for validating and retraining the model for novel language patterns in new tasks, which can help save around 80% of the efforts required in blind model-adaptation approaches.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03995

PDF

https://arxiv.org/pdf/1903.03995
Read All
From Low-Level Events to Activities - A Session-Based Approach

2019-03-10

Massimiliano de Leoni

arXiv_AI

arXiv_AI Knowledge
Abstract

Process-Mining techniques aim to use event data about past executions to gain insight into how processes are executed. While these techniques are proven to be very valuable, they are less successful to reach their goal if the process is flexible and, hence, events can potentially occur in any order. Furthermore, information systems can record events at very low level, which do not match the high-level concepts known at business level. Without abstracting sequences of events to high-level concepts, the results of applying process mining (e.g., discovered models) easily become very complex and difficult to interpret, which ultimately means that they are of little use. A large body of research exists on event abstraction but typically a large amount of domain knowledge is required to be fed in, which is often not readily available. Other abstraction techniques are unsupervised, which give lower accuracy. This paper puts forward a technique that requires limited domain knowledge that can be easily provided. Traces are divided in sessions, and each session is abstracted as one single high-level activity execution. The abstraction is based on a combination of automatic clustering and visualization methods. The technique was assessed on two case studies that evidently exhibits a large amount of behavior. The results clearly illustrate the benefits of the abstraction to convey knowledge to stakeholders.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03993

PDF

http://arxiv.org/pdf/1903.03993
Read All
From Low-Level Events to Activities -- A Session-Based Approach

2019-03-10

Massimiliano de Leoni

arXiv_AI

arXiv_AI Knowledge
Abstract

Process-Mining techniques aim to use event data about past executions to gain insight into how processes are executed. While these techniques are proven to be very valuable, they are less successful to reach their goal if the process is flexible and, hence, events can potentially occur in any order. Furthermore, information systems can record events at very low level, which do not match the high-level concepts known at business level. Without abstracting sequences of events to high-level concepts, the results of applying process mining (e.g., discovered models) easily become very complex and difficult to interpret, which ultimately means that they are of little use. A large body of research exists on event abstraction but typically a large amount of domain knowledge is required to be fed in, which is often not readily available. Other abstraction techniques are unsupervised, which give lower accuracy. This paper puts forward a technique that requires limited domain knowledge that can be easily provided. Traces are divided in sessions, and each session is abstracted as one single high-level activity execution. The abstraction is based on a combination of automatic clustering and visualization methods. The technique was assessed on two case studies that evidently exhibits a large amount of behavior. The results clearly illustrate the benefits of the abstraction to convey knowledge to stakeholders.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03993

PDF

http://arxiv.org/pdf/1903.03993
Read All
Uncertainty Propagation in Deep Neural Network Using Active Subspace

2019-03-10

Weiqi Ji, Zhuyin Ren, Chung K. Law

arXiv_CV

arXiv_CV Adversarial Face CNN Prediction
Abstract

The inputs of deep neural network (DNN) from real-world data usually come with uncertainties. Yet, it is challenging to propagate the uncertainty in the input features to the DNN predictions at a low computational cost. This work employs a gradient-based subspace method and response surface technique to accelerate the uncertainty propagation in DNN. Specifically, the active subspace method is employed to identify the most important subspace in the input features using the gradient of the DNN output to the inputs. Then the response surface within that low-dimensional subspace can be efficiently built, and the uncertainty of the prediction can be acquired by evaluating the computationally cheap response surface instead of the DNN models. In addition, the subspace can help explain the adversarial examples. The approach is demonstrated in MNIST datasets with a convolutional neural network.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03989

PDF

https://arxiv.org/pdf/1903.03989
Read All
Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches

2019-03-10

Philip John Gorinski, Honghan Wu, Claire Grover, Richard Tobin, Conn Talbot, Heather Whalley, Cathie Sudlow, William Whiteley, Beatrice Alex

arXiv_CL

arXiv_CL Transfer_Learning Deep_Learning Recognition
Abstract

This work investigates multiple approaches to Named Entity Recognition (NER) for text in Electronic Health Record (EHR) data. In particular, we look into the application of (i) rule-based, (ii) deep learning and (iii) transfer learning systems for the task of NER on brain imaging reports with a focus on records from patients with stroke. We explore the strengths and weaknesses of each approach, develop rules and train on a common dataset, and evaluate each system’s performance on common test sets of Scottish radiology reports from two sources (brain imaging reports in ESS – Edinburgh Stroke Study data collected by NHS Lothian as well as radiology reports created in NHS Tayside). Our comparison shows that a hand-crafted system is the most accurate way to automatically label EHR, but machine learning approaches can provide a feasible alternative where resources for a manual system are not readily available.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03985

PDF

https://arxiv.org/pdf/1903.03985
Read All
Improving Humanness of Virtual Agents and Users' Cooperation through Emotions

2019-03-10

Moojan Ghafurian, Neil Budnarain, Jesse Hoey

arXiv_AI

arXiv_AI
Abstract

In this paper, we analyze the performance of an agent developed according to a well-accepted appraisal theory of human emotion with respect to how it modulates play in the context of a social dilemma. We ask if the agent will be capable of generating interactions that are considered to be more human than machine-like. We conduct an experiment with 117 participants and show how participants rate our agent on dimensions of human-uniqueness (which separates humans from animals) and human-nature (which separates humans from machines). We show that our appraisal theoretic agent is perceived to be more human-like than baseline models, by significantly improving both human-nature and human-uniqueness aspects of the intelligent agent. We also show that perception of humanness positively affects enjoyment and cooperation in the social dilemma.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03980

PDF

http://arxiv.org/pdf/1903.03980
Read All
Deep Griffin-Lim Iteration

2019-03-10

Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

arXiv_SD

arXiv_SD Knowledge
Abstract

This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN). To retrieve a time-domain signal from its amplitude spectrogram, the corresponding phase is required. One of the popular phase reconstruction methods is the Griffin-Lim algorithm (GLA), which is based on the redundancy of the short-time Fourier transform. However, GLA often involves many iterations and produces low-quality signals owing to the lack of prior knowledge of the target signal. In order to address these issues, in this study, we propose an architecture which stacks a sub-block including two GLA-inspired fixed layers and a DNN. The number of stacked sub-blocks is adjustable, and we can trade the performance and computational load based on requirements of applications. The effectiveness of the proposed method is investigated by reconstructing phases from amplitude spectrograms of speeches.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03971

PDF

https://arxiv.org/pdf/1903.03971
Read All
Communication constrained cloud-based long-term visual localization in real time

2019-03-10

Xiaqing Ding, Yue Wang, Li Tang, Huan Yin, Rong Xiong

arXiv_RO

arXiv_RO
Abstract

Visual localization is one of the primary capabilities for mobile robots. Long-term visual localization in real time is particularly challenging, in which the robot is required to efficiently localize itself using visual data where appearance may change significantly over time. In this paper, we propose a cloud-based visual localization system targeting at long-term localization in real time. On the robot, we employ two estimators to achieve accurate and real-time performance. One is a sliding-window based visual inertial odometry, which integrates constraints from consecutive observations and self-motion measurements, as well as the constraints induced by localization on the cloud. This estimator builds a local visual submap as the virtual observation which is then sent to the cloud as new localization constraints. The other one is a delayed state Extended Kalman Filter to fuse the pose of the robot localized from the cloud, the local odometry and the high-frequency inertial measurements. On the cloud, we propose a longer sliding-window based localization method to aggregate the virtual observations for larger field of view, leading to more robust alignment between virtual observations and the map. Under this architecture, the robot can achieve drift-free and real-time localization using onboard resources even in a network with limited bandwidth, high latency and existence of package loss, which enables the autonomous navigation in real-world environment. We evaluate the effectiveness of our system on a dataset with challenging seasonal and illuminative variations. We further validate the robustness of the system under challenging network conditions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03968

PDF

https://arxiv.org/pdf/1903.03968
Read All
2-Entity RANSAC for robust visual localization in changing environment

2019-03-10

Yanmei Jiao, Yue Wang, Bo Fu, Xiaqing Ding, Qimeng Tan, Lei Chen, Rong Xiong

arXiv_RO

arXiv_RO Attention
Abstract

Visual localization has attracted considerable attention due to its low-cost and stable sensor, which is desired in many applications, such as autonomous driving, inspection robots and unmanned aerial vehicles. However, current visual localization methods still struggle with environmental changes across weathers and seasons, as there is significant appearance variation between the map and the query image. The crucial challenge in this situation is that the percentage of outliers, i.e. incorrect feature matches, is high. In this paper, we derive minimal closed form solutions for 3D-2D localization with the aid of inertial measurements, using only 2 pairs of point matches or 1 pair of point match and 1 pair of line match. These solutions are further utilized in the proposed 2-entity RANSAC, which is more robust to outliers as both line and point features can be used simultaneously and the number of matches required for pose calculation is reduced. Furthermore, we introduce three feature sampling strategies with different advantages, enabling an automatic selection mechanism. With the mechanism, our 2-entity RANSAC can be adaptive to the environments with different distribution of feature types in different segments. Finally, we evaluate the method on both synthetic and real-world datasets, validating its performance and effectiveness in inter-session scenarios.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03967

PDF

https://arxiv.org/pdf/1903.03967
Read All
Deep Robust Subjective Visual Property Prediction in Crowdsourcing

2019-03-10

Qianqian Xu, Zhiyong Yang, Yangbangyan Jiang, Xiaochun Cao, Qingming Huang, Yuan Yao

arXiv_CV

arXiv_CV Sparse Attention Prediction Detection
Abstract

The problem of estimating subjective visual properties (SVP) of images (e.g., Shoes A is more comfortable than B) is gaining rising attention. Due to its highly subjective nature, different annotators often exhibit different interpretations of scales when adopting absolute value tests. Therefore, recent investigations turn to collect pairwise comparisons via crowdsourcing platforms. However, crowdsourcing data usually contains outliers. For this purpose, it is desired to develop a robust model for learning SVP from crowdsourced noisy annotations. In this paper, we construct a deep SVP prediction model which not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. Specifically, we construct a comparison multi-graph based on the collected annotations, where different labeling results correspond to edges with different directions between two vertexes. Then, we propose a generalized deep probabilistic framework which consists of an SVP prediction module and an outlier modeling module that work collaboratively and are optimized jointly. Extensive experiments on various benchmark datasets demonstrate that our new approach guarantees promising results.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03956

PDF

https://arxiv.org/pdf/1903.03956
Read All
Domain Randomization for Active Pose Estimation

2019-03-10

Xinyi Ren, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Abhishek Gupta, Aviv Tamar, Pieter Abbeel

arXiv_CV

arXiv_CV Pose_Estimation
Abstract

Accurate state estimation is a fundamental component of robotic control. In robotic manipulation tasks, as is our focus in this work, state estimation is essential for identifying the positions of objects in the scene, forming the basis of the manipulation plan. However, pose estimation typically requires expensive 3D cameras or additional instrumentation such as fiducial markers to perform accurately. Recently, Tobin et al.~introduced an approach to pose estimation based on domain randomization, where a neural network is trained to predict pose directly from a 2D image of the scene. The network is trained on computer-generated images with a high variation in textures and lighting, thereby generalizing to real-world images. In this work, we investigate how to improve the accuracy of domain randomization based pose estimation. Our main idea is that active perception – moving the robot to get a better estimate of pose – can be trained in simulation and transferred to real using domain randomization. In our approach, the robot trains in a domain-randomized simulation how to estimate pose from a \emph{sequence} of images. We show that our approach can significantly improve the accuracy of standard pose estimation in several scenarios: when the robot holding an object moves, when reference objects are moved in the scene, or when the camera is moved around the object.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03953

PDF

https://arxiv.org/pdf/1903.03953
Read All
Rethinking System Health Management

2019-03-10

Edward Balaban, Stephen B. Johnson, Mykel J. Kochenderfer

arXiv_AI

arXiv_AI
Abstract

Health management of complex dynamic systems has traditionally evolved separately from automated control, planning, and scheduling (generally referred to in the paper as decision making). A goal of Integrated System Health Management has been to enable coordination between system health management and decision making, although successful practical implementations have remained limited. This paper proposes that, rather than being treated as connected, yet distinct entities, system health management and decision making should be unified in their formulations. Enabled by advances in modeling and computing, we argue that the unified approach will increase a system’s operational effectiveness and may also lead to a lower overall system complexity. We overview the prevalent system health management methodology and illustrate its limitations through numerical examples. We then describe the proposed unification approach and show how it accommodates the typical system health management concepts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03948

PDF

http://arxiv.org/pdf/1903.03948
Read All
Object recognition and tracking using Haar-like Features Cascade Classifiers: Application to a quad-rotor UAV

2019-03-10

Luis Arreola, Gesem Gudiño, Gerardo Flores

arXiv_RO

arXiv_RO Face Tracking Detection Recognition
Abstract

In this paper, we develop a functional Unmanned Aerial Vehicle (UAV), capable of tracking an object using a Machine Learning-like vision system called Haar feature-based cascade classifier. The image processing is made on-board with a high processor single-board computer. Based on the detected object and its position, the quadrotor must track it in order to be in a centered position and in a safe distance to it. The object in question is a human face; the experiments were conducted in a two-step detection, searching first for the upper-body and then searching for the face inside of the human body detected area. Once the human face is detected the quadrotor must follow it automatically. Experiments were conducted which shows the effectiveness of our mythology; these results are showing in a video.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03947

PDF

https://arxiv.org/pdf/1903.03947
Read All
Predicting Good Configurations for GitHub and Stack Overflow Topic Models

2019-03-10

Christoph Treude, Markus Wagner

arXiv_CL

arXiv_CL
Abstract

Software repositories contain large amounts of textual data, ranging from source code comments and issue descriptions to questions, answers, and comments on Stack Overflow. To make sense of this textual data, topic modelling is frequently used as a text-mining tool for the discovery of hidden semantic structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model that aims to explain the structure of a corpus by grouping texts. LDA requires multiple parameters to work well, and there are only rough and sometimes conflicting guidelines available on how these parameters should be set. In this paper, we contribute (i) a broad study of parameters to arrive at good local optima for GitHub and Stack Overflow text corpora, (ii) an a-posteriori characterisation of text corpora related to eight programming languages, and (iii) an analysis of corpus feature importance via per-corpus LDA configuration. We find that (1) popular rules of thumb for topic modelling parameter configuration are not applicable to the corpora used in our experiments, (2) corpora sampled from GitHub and Stack Overflow have different characteristics and require different configurations to achieve good model fit, and (3) we can predict good configurations for unseen corpora reliably. These findings support researchers and practitioners in efficiently determining suitable configurations for topic modelling when analysing textual data contained in software repositories.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.04749

PDF

http://arxiv.org/pdf/1804.04749
Read All
Rolling-Shutter-Aware Differential SfM and Image Rectification

2019-03-10

Bingbing Zhuang, Loong-Fah Cheong, Gim Hee Lee

arXiv_CV

arXiv_CV Pose_Estimation
Abstract

In this paper, we develop a modified differential Structure from Motion (SfM) algorithm that can estimate relative pose from two consecutive frames despite of Rolling Shutter (RS) artifacts. In particular, we show that under constant velocity assumption, the errors induced by the rolling shutter effect can be easily rectified by a linear scaling operation on each optical flow. We further propose a 9-point algorithm to recover the relative pose of a rolling shutter camera that undergoes constant acceleration motion. We demonstrate that the dense depth maps recovered from the relative pose of the RS camera can be used in a RS-aware warping for image rectification to recover high-quality Global Shutter (GS) images. Experiments on both synthetic and real RS images show that our RS-aware differential SfM algorithm produces more accurate results on relative pose estimation and 3D reconstruction from images distorted by RS effect compared to standard SfM algorithms that assume a GS camera model. We also demonstrate that our RS-aware warping for image rectification method outperforms state-of-the-art commercial software products, i.e. Adobe After Effects and Apple Imovie, at removing RS artifacts.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03943

PDF

https://arxiv.org/pdf/1903.03943
Read All
Blind Predicting Similar Quality Map for Image Quality Assessment

2019-03-10

Da Pan, Ping Shi, Ming Hou, Zefeng Ying, Sizhe Fu, Yuan Zhang

arXiv_CV

arXiv_CV QA CNN Prediction Relation
Abstract

A key problem in blind image quality assessment (BIQA) is how to effectively model the properties of human visual system in a data-driven manner. In this paper, we propose a simple and efficient BIQA model based on a novel framework which consists of a fully convolutional neural network (FCNN) and a pooling network to solve this problem. In principle, FCNN is capable of predicting a pixel-by-pixel similar quality map only from a distorted image by using the intermediate similarity maps derived from conventional full-reference image quality assessment methods. The predicted pixel-by-pixel quality maps have good consistency with the distortion correlations between the reference and distorted images. Finally, a deep pooling network regresses the quality map into a score. Experiments have demonstrated that our predictions outperform many state-of-the-art BIQA methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.08493

PDF

http://arxiv.org/pdf/1805.08493
Read All
Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

2019-03-10

Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun

arXiv_CV

arXiv_CV Image_Caption Adversarial Caption
Abstract

We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages - distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions. We show that our proposed network is capable of producing accurate and diverse captions across images.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.00861

PDF

https://arxiv.org/pdf/1804.00861
Read All
Automated Segmentation of Knee MRI Using Hierarchical Classifiers and Just Enough Interaction Based Learning: Data from Osteoarthritis Initiative

2019-03-10

Satyananda Kashyap, Ipek Oguz, Honghai Zhang, Milan Sonka

arXiv_CV

arXiv_CV Segmentation Face
Abstract

We present a fully automated learning-based approach for segmenting knee cartilage in the presence of osteoarthritis (OA). The algorithm employs a hierarchical set of two random forest classifiers. The first is a neighborhood approximation forest, the output probability map of which is utilized as a feature set for the second random forest (RF) classifier. The output probabilities of the hierarchical approach are used as cost functions in a Layered Optimal Graph Segmentation of Multiple Objects and Surfaces (LOGISMOS). In this work, we highlight a novel post-processing interaction called just-enough interaction (JEI) which enables quick and accurate generation of a large set of training examples. Disjoint sets of 15 and 13 subjects were used for training and tested on another disjoint set of 53 knee datasets. All images were acquired using a double echo steady state (DESS) MRI sequence and are from the osteoarthritis initiative (OAI) database. Segmentation performance using the learning-based cost function showed significant reduction in segmentation errors ($p< 0.05$) in comparison with conventional gradient-based cost functions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03929

PDF

https://arxiv.org/pdf/1903.03929
Read All
Learning-Based Cost Functions for 3D and 4D Multi-Surface Multi-Object Segmentation of Knee MRI: Data from the Osteoarthritis Initiative

2019-03-10

Satyananda Kashyap, Honghai Zhang, Karan Rao, Milan Sonka

arXiv_CV

arXiv_CV Segmentation Face
Abstract

A fully automated knee MRI segmentation method to study osteoarthritis (OA) was developed using a novel hierarchical set of random forests (RF) classifiers to learn the appearance of cartilage regions and their boundaries. A neighborhood approximation forest is used first to provide contextual feature to the second-level RF classifier that also considers local features and produces location-specific costs for the layered optimal graph image segmentation of multiple objects and surfaces (LOGISMOS) framework. Double echo steady state (DESS) MRIs used in this work originated from the Osteoarthritis Initiative (OAI) study. Trained on 34 MRIs with varying degrees of OA, the performance of the learning-based method tested on 108 MRIs showed a significant reduction in segmentation errors (\emph{p}$<$0.05) compared with the conventional gradient-based and single-stage RF-learned costs. The 3D LOGISMOS was extended to longitudinal-3D (4D) to simultaneously segment multiple follow-up visits of the same patient. As such, data from all time-points of the temporal sequence contribute information to a single optimal solution that utilizes both spatial 3D and temporal contexts. 4D LOGISMOS validation on 108 MRIs from baseline and 12 month follow-up scans of 54 patients showed a significant reduction in segmentation errors (\emph{p}$<$0.01) compared to 3D. Finally, the potential of 4D LOGISMOS was further explored on the same 54 patients using 5 annual follow-up scans demonstrating a significant improvement of measuring cartilage thickness (\emph{p}$<$0.01) compared to the sequential 3D approach.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03927

PDF

https://arxiv.org/pdf/1903.03927
Read All
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Autonomous Robots

2019-03-10

Pooyan Jamshidi, Javier Cámara, Bradley Schmerl, Christian Kästner, David Garlan

arXiv_AI

arXiv_AI Adversarial Quantitative
Abstract

Modern cyber-physical systems (e.g., robotics systems) are typically composed of physical and software components, the characteristics of which are likely to change over time. Assumptions about parts of the system made at design time may not hold at run time, especially when a system is deployed for long periods (e.g., over decades). Self-adaptation is designed to find reconfigurations of systems to handle such run-time inconsistencies. Planners can be used to find and enact optimal reconfigurations in such an evolving context. However, for systems that are highly configurable, such planning becomes intractable due to the size of the adaptation space. To overcome this challenge, in this paper we explore an approach that (a) uses machine learning to find Pareto-optimal configurations without needing to explore every configuration and (b) restricts the search space to such configurations to make planning tractable. We explore this in the context of robot missions that need to consider task timeliness and energy consumption. An independent evaluation shows that our approach results in high-quality adaptation plans in uncertain and adversarial environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03920

PDF

http://arxiv.org/pdf/1903.03920
Read All
Shape2Motion: Joint Analysis of Motion Parts and Attributes from 3D Shapes

2019-03-10

Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, Kai Xu

arXiv_CV

arXiv_CV Segmentation Optimization
Abstract

For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input. The problem is significantly different from those tackled in the existing works which assume the availability of either a pre-existing shape segmentation or multiple 3D models in different motion states. To that end, we develop Shape2Motion which takes a single 3D point cloud as input, and jointly computes a mobility-oriented segmentation and the associated motion attributes. Shape2Motion is comprised of two deep neural networks designed for mobility proposal generation and mobility optimization, respectively. The key contribution of these networks is the novel motion-driven features and losses used in both motion part segmentation and motion attribute estimation. This is based on the observation that the movement of a functional part preserves the shape structure. We evaluate Shape2Motion with a newly proposed benchmark for mobility analysis of 3D shapes. Results demonstrate that our method achieves the state-of-the-art performance both in terms of motion part segmentation and motion attribute estimation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03911

PDF

https://arxiv.org/pdf/1903.03911
Read All
Rectangular Bounding Process

2019-03-10

Xuhui Fan, Bin Li, Scott Anthony Sisson

arXiv_AI

arXiv_AI Sparse Relation
Abstract

Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model – the Rectangular Bounding Process (RBP) – to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is self-consistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP {in rich yet parsimonious expressiveness} compared to the state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03906

PDF

http://arxiv.org/pdf/1903.03906
Read All
Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation

2019-03-10

Haofu Liao, Wei-An Lin, Jiarui Zhang, Jingdan Zhang, Jiebo Luo, S. Kevin Zhou

arXiv_CV

arXiv_CV Tracking Optimization
Abstract

We propose to tackle the problem of multiview 2D/3D rigid registration for intervention via a Point-Of-Interest Network for Tracking and Triangulation (POINT^2). POINT^2 learns to establish 2D point-to-point correspondences between the pre- and intra-intervention images by tracking a set of random POIs. The 3D pose of the pre-intervention volume is then estimated through a triangulation layer. In POINT^2, the unified framework of the POI tracker and the triangulation layer enables learning informative 2D features and estimating 3D pose jointly. In contrast to existing approaches, POINT^2 only requires a single forward-pass to achieve a reliable 2D/3D registration. As the POI tracker is shift-invariant, POINT^2 is more robust to the initial pose of the 3D pre-intervention image. Extensive experiments on a large-scale clinical cone-beam CT (CBCT) dataset show that the proposed POINT^2 method outperforms the existing learning-based method in terms of accuracy, robustness and running time. Furthermore, when used as an initial pose estimator, our method also improves the robustness and speed of the state-of-the-art optimization-based approaches by ten folds.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03896

PDF

https://arxiv.org/pdf/1903.03896
Read All
A Hybrid GA-PSO Method for Evolving Architecture and Short Connections of Deep Convolutional Neural Networks

2019-03-10

Bin Wang, Yanan Sun, Bing Xue, Mengjie Zhang

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

Image classification is a difficult machine learning task, where Convolutional Neural Networks (CNNs) have been applied for over 20 years in order to solve the problem. In recent years, instead of the traditional way of only connecting the current layer with its next layer, shortcut connections have been proposed to connect the current layer with its forward layers apart from its next layer, which has been proved to be able to facilitate the training process of deep CNNs. However, there are various ways to build the shortcut connections, it is hard to manually design the best shortcut connections when solving a particular problem, especially given the design of the network architecture is already very challenging. In this paper, a hybrid evolutionary computation (EC) method is proposed to \textit{automatically} evolve both the architecture of deep CNNs and the shortcut connections. Three major contributions of this work are: Firstly, a new encoding strategy is proposed to encode a CNN, where the architecture and the shortcut connections are encoded separately; Secondly, a hybrid two-level EC method, which combines particle swarm optimisation and genetic algorithms, is developed to search for the optimal CNNs; Lastly, an adjustable learning rate is introduced for the fitness evaluations, which provides a better learning rate for the training process given a fixed number of epochs. The proposed algorithm is evaluated on three widely used benchmark datasets of image classification and compared with 12 peer Non-EC based competitors and one EC based competitor. The experimental results demonstrate that the proposed method outperforms all of the peer competitors in terms of classification accuracy.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.03893

PDF

https://arxiv.org/pdf/1903.03893
Read All

126/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL