Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

M-VAD Names: a Dataset for Video Captioning with Naming

2019-03-04

Stefano Pini, Marcella Cornia, Federico Bolelli, Lorenzo Baraldi, Rita Cucchiara

arXiv_CV

arXiv_CV Video_Caption Caption
Abstract

Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic “someone” tag. The lack of movie description datasets with characters’ visual annotations surely plays a relevant role in this shortage. Recently, we proposed to extend the M-VAD dataset by introducing such information. In this paper, we present an improved version of the dataset, namely M-VAD Names, and its semi-automatic annotation procedure. The resulting dataset contains 63k visual tracks and 34k textual mentions, all associated with character identities. To showcase the features of the dataset and quantify the complexity of the naming task, we investigate multimodal architectures to replace the “someone” tags with proper character names in existing video captions. The evaluation is further extended by testing this application on videos outside of the M-VAD Names dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01489

PDF

https://arxiv.org/pdf/1903.01489
Read All
VideoFlow: A Flow-Based Generative Model for Video

2019-03-04

Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, Durk Kingma

arXiv_AI

arXiv_AI Knowledge Optimization Prediction
Abstract

Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. In particular, learning predictive models of videos offers an especially appealing mechanism to enable a rich understanding of the physical world: videos of real-world interactions are plentiful and readily available, and a model that can predict future video frames can not only capture useful representations of the world, but can be useful in its own right, for problems such as model-based robotic control. However, a central challenge in video prediction is that the future is highly uncertain: a sequence of past observations of events can imply many possible futures. Although a number of recent works have studied probabilistic models that can represent uncertain futures, such models are either extremely expensive computationally (as in the case of pixel-level autoregressive models), or do not directly optimize the likelihood of the data. In this work, we propose a model for video prediction based on normalizing flows, which allows for direct optimization of the data likelihood, and produces high-quality stochastic predictions. To our knowledge, our work is the first to propose multi-frame video prediction with normalizing flows. We describe an approach for modeling the latent space dynamics, and demonstrate that flow-based generative models offer a viable and competitive approach to generative modeling of video.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01434

PDF

http://arxiv.org/pdf/1903.01434
Read All
Data Augmentation for Drum Transcription with Convolutional Neural Networks

2019-03-04

Celine Jacques, Axel Roebel

arXiv_SD

arXiv_SD Regularization CNN Deep_Learning
Abstract

A recurrent issue in deep learning is the scarcity of data, in particular precisely annotated data. Few publicly available databases are correctly annotated and generating correct labels is very time consuming. The present article investigates into data augmentation strategies for Neural Networks training, particularly for tasks related to drum transcription. These tasks need very precise annotations. This article investigates state-of-the-art sound transformation algorithms for remixing noise and sinusoidal parts, remixing attacks, transposing with and without time compensation and compares them to basic regularization methods such as using dropout and additive Gaussian noise. And it shows how a drum transcription algorithm based on CNN benefits from the proposed data augmentation strategy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01416

PDF

http://arxiv.org/pdf/1903.01416
Read All
Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation

2019-03-04

Alice Cohen-Hadria, Axel Roebel, Geoffroy Peeters

arXiv_SD

arXiv_SD RNN Deep_Learning
Abstract

State-of-the-art singing voice separation is based on deep learning making use of CNN structures with skip connections (like U-net model, Wave-U-Net model, or MSDENSELSTM). A key to the success of these models is the availability of a large amount of training data. In the following study, we are interested in singing voice separation for mono signals and will investigate into comparing the U-Net and the Wave-U-Net that are structurally similar, but work on different input representations. First, we report a few results on variations of the U-Net model. Second, we will discuss the potential of state of the art speech and music transformation algorithms for augmentation of existing data sets and demonstrate that the effect of these augmentations depends on the signal representations used by the model. The results demonstrate a considerable improvement due to the augmentation for both models. But pitch transposition is the most effective augmentation strategy for the U-Net model, while transposition, time stretching, and formant shifting have a much more balanced effect on the Wave-U-Net model. Finally, we compare the two models on the same dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01415

PDF

http://arxiv.org/pdf/1903.01415
Read All
Polylingual Wordnet

2019-03-04

Mihael Arcan, John McCrae, Paul Buitelaar

arXiv_CL

arXiv_CL
Abstract

Princeton WordNet is one of the most important resources for natural language processing, but is only available for English. While it has been translated using the expand approach to many other languages, this is an expensive manual process. Therefore it would be beneficial to have a high-quality automatic translation approach that would support NLP techniques, which rely on WordNet in new languages. The translation of wordnets is fundamentally complex because of the need to translate all senses of a word including low frequency senses, which is very challenging for current machine translation approaches. For this reason we leverage existing translations of WordNet in other languages to identify contextual information for wordnet senses from a large set of generic parallel corpora. We evaluate our approach using 10 translated wordnets for European languages. Our experiment shows a significant improvement over translation without any contextual information. Furthermore, we evaluate how the choice of pivot languages affects performance of multilingual word sense disambiguation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01411

PDF

http://arxiv.org/pdf/1903.01411
Read All
An Adversarial Super-Resolution Remedy for Radar Design Trade-offs

2019-03-04

Sherif Abdulatif, Karim Armanious, Fady Aziz, Urs Schneider, Bin Yang

arXiv_CV

arXiv_CV Adversarial Super_Resolution GAN
Abstract

Radar is of vital importance in many fields, such as autonomous driving, safety and surveillance applications. However, it suffers from stringent constraints on its design parametrization leading to multiple trade-offs. For example, the bandwidth in FMCW radars is inversely proportional with both the maximum unambiguous range and range resolution. In this work, we introduce a new method for circumventing radar design trade-offs. We propose the use of recent advances in computer vision, more specifically generative adversarial networks (GANs), to enhance low-resolution radar acquisitions into higher resolution counterparts while maintaining the advantages of the low-resolution parametrization. The capability of the proposed method was evaluated on the velocity resolution and range-azimuth trade-offs in micro-Doppler signatures and FMCW uniform linear array (ULA) radars, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01392

PDF

http://arxiv.org/pdf/1903.01392
Read All
Sim-to-Real Transfer for Biped Locomotion

2019-03-04

Wenhao Yu, Visak CV Kumar, Greg Turk, C. Karen Liu

arXiv_RO

arXiv_RO Optimization
Abstract

We present a new approach for transfer of dynamic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters {\mu} of the hardware (e.g. friction, center-of-mass) in two distinct stages, before policy learning (pre-sysID) and after policy learning (post-sysID). Pre-sysID begins by collecting trajectories from the physical hardware based on a set of generic motion sequences. Because the trajectories may not be related to the task of interest, presysID does not attempt to accurately identify the true value of {\mu}, but only to approximate the range of {\mu} to guide the policy learning. Next, a Projected Universal Policy (PUP) is created by simultaneously training a network that projects {\mu} to a low-dimensional latent variable {\eta} and a family of policies that are conditioned on {\eta}. The second round of system identification (post-sysID) is then carried out by deploying the PUP on the robot hardware using task-relevant trajectories. We use Bayesian Optimization to determine the values for {\eta} that optimizes the performance of PUP on the real hardware. We have used this approach to create three successful biped locomotion controllers (walk forward, walk backwards, walk sideways) on the Darwin OP2 robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01390

PDF

http://arxiv.org/pdf/1903.01390
Read All
α-Rank: Multi-Agent Evaluation by Evolution

2019-03-04

Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos

arXiv_AI

arXiv_AI
Abstract

We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). {\alpha}-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of our new model’s direct correspondence to the dynamical MCC solution concept when its ranking-intensity parameter, {\alpha}, is chosen to be large, which exactly forms the basis of {\alpha}-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our {\alpha}-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that reveal the formal underpinnings of the {\alpha}-Rank methodology. We illustrate the method in canonical games and in AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01373

PDF

https://arxiv.org/pdf/1903.01373
Read All
{alpha}-Rank: Multi-Agent Evaluation by Evolution

2019-03-04

Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos

arXiv_AI

arXiv_AI
Abstract

We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). {\alpha}-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of our new model’s direct correspondence to the dynamical MCC solution concept when its ranking-intensity parameter, {\alpha}, is chosen to be large, which exactly forms the basis of {\alpha}-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our {\alpha}-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that reveal the formal underpinnings of the {\alpha}-Rank methodology. We illustrate the method in canonical games and in AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01373

PDF

http://arxiv.org/pdf/1903.01373
Read All
Microscopic Traffic Simulation by Cooperative Multi-agent Deep Reinforcement Learning

2019-03-04

Giulio Bacchiani, Daniele Molinari, Marco Patander

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Expert human drivers perform actions relying on traffic laws and their previous experience. While traffic laws are easily embedded into an artificial brain, modeling human complex behaviors which come from past experience is a more challenging task. One of these behaviors is the capability of communicating intentions and negotiating the right of way through driving actions, as when a driver is entering a crowded roundabout and observes other cars movements to guess the best time to merge in. In addition, each driver has its own unique driving style, which is conditioned by both its personal characteristics, such as age and quality of sight, and external factors, such as being late or in a bad mood. For these reasons, the interaction between different drivers is not trivial to simulate in a realistic manner. In this paper, this problem is addressed by developing a microscopic simulator using a Deep Reinforcement Learning Algorithm based on a combination of visual frames, representing the perception around the vehicle, and a vector of numerical parameters. In particular, the algorithm called Asynchronous Advantage Actor-Critic has been extended to a multi-agent scenario in which every agent needs to learn to interact with other similar agents. Moreover, the model includes a novel architecture such that the driving style of each vehicle is adjustable by tuning some of its input parameters, permitting to simulate drivers with different levels of aggressiveness and desired cruising speeds.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01365

PDF

http://arxiv.org/pdf/1903.01365
Read All
Automated Generation of Reactive Programs from Human Demonstration for Orchestration of Robot Behaviors

2019-03-04

Vincent Berenz, Ahmed Bjelic, Jim Mainprice

arXiv_RO

arXiv_RO
Abstract

Social robots or collaborative robots that have to interact with people in a reactive way are difficult to program. This difficulty stems from the different skills required by the programmer: to provide an engaging user experience the behavior must include a sense of aesthetics while robustly operating in a continuously changing environment. The Playful framework allows composing such dynamic behaviors using a basic set of action and perception primitives. Within this framework, a behavior is encoded as a list of declarative statements corresponding to high-level sensory-motor couplings. To facilitate non-expert users to program such behaviors, we propose a Learning from Demonstration (LfD) technique that maps motion capture of humans directly to a Playful script. The approach proceeds by identifying the sensory-motor couplings that are active at each step using the Viterbi path in a Hidden Markov Model (HMM). Given these activation patterns, binary classifiers called evaluations are trained to associate activations to sensory data. Modularity is increased by clustering the sensory-motor couplings, leading to a hierarchical tree structure. The novelty of the proposed approach is that the learned behavior is encoded not in terms of trajectories in a task space, but as couplings between sensory information and high-level motor actions. This provides advantages in terms of behavioral generalization and reactivity displayed by the robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01352

PDF

http://arxiv.org/pdf/1903.01352
Read All
Toward Achieving Formal Guarantees for Human-Aware Controllers in Human-Robot Interactions

2019-03-04

Rachel Schlossman, Minkyu Kim, Ufuk Topcu, Luis Sentis

arXiv_RO

arXiv_RO
Abstract

With the primary objective of human-robot interaction being to support humans’ goals, there exists a need to formally synthesize robot controllers that can provide the desired service. Synthesis techniques have the benefit of providing formal guarantees for specification satisfaction. There is potential to apply these techniques for devising robot controllers whose specifications are coupled with human needs. This paper explores the use of formal methods to construct human-aware robot controllers to support the productivity requirements of humans. We tackle these types of scenarios via human workload-informed models and reactive synthesis. This strategy allows us to synthesize controllers that fulfill formal specifications that are expressed as linear temporal logic formulas. We present a case study in which we reason about a work delivery and pickup task such that the robot increases worker productivity, but not stress induced by high work backlog. We demonstrate our controller using the Toyota HSR, a mobile manipulator robot. The results demonstrate the realization of a robust robot controller that is guaranteed to properly reason and react in collaborative tasks with human partners.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01350

PDF

http://arxiv.org/pdf/1903.01350
Read All
Reduced Focal Loss: 1st Place Solution to xView object detection in Satellite Imagery

2019-03-04

Nikolay Sergievskiy, Alexander Ponamarev

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

This paper describes our approach to the DIUx xView 2018 Detection Challenge [1]. This challenge focuses on a new satellite imagery dataset. The dataset contains 60 object classes that are highly imbalanced. Due to the imbalanced nature of the dataset, the training process becomes significantly more challenging. To address this problem, we introduce a novel Reduced Focal Loss function, which brought us 1st place in the DIUx xView 2018 Detection Challenge.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01347

PDF

http://arxiv.org/pdf/1903.01347
Read All
Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space

2019-03-04

Zhou Fan, Rui Su, Weinan Zhang, Yong Yu

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose the structured action space into simpler action spaces along with a critic network to guide the training of all sub-actor networks. While this paper is mainly focused on parameterized action space, the proposed architecture, which we call hybrid actor-critic, can be extended for more general action spaces which has a hierarchical structure. We present an instance of the hybrid actor-critic architecture based on proximal policy optimization (PPO), which we refer to as hybrid proximal policy optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with parameterized action space, where H-PPO demonstrates superior performance over previous methods of parameterized action reinforcement learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01344

PDF

http://arxiv.org/pdf/1903.01344
Read All
The StreetLearn Environment and Dataset

2019-03-04

Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Denis Teplyashin, Karl Moritz Hermann, Mateusz Malinowski, Matthew Koichi Grimes, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for decision-making and reinforcement learning, however, and in general the perspective of navigation as an interactive learning task, where the actions and behaviours of a learning agent are learned simultaneously with the perception and planning, is relatively unsupported. Thus, existing navigation benchmarks generally rely on static datasets (Geiger et al., 2013; Kendall et al., 2015) or simulators (Beattie et al., 2016; Shah et al., 2018). To support and validate research in end-to-end navigation, we present StreetLearn: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task. The environment code, baseline agent code, and the dataset are available at this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01292

PDF

http://arxiv.org/pdf/1903.01292
Read All
Joint segmentation and classification of retinal arteries/veins from fundus images

2019-03-04

Fantin Girard, Conrad Kavalec, Farida Cheriet

arXiv_CV

arXiv_CV Segmentation CNN Classification Deep_Learning
Abstract

Objective Automatic artery/vein (A/V) segmentation from fundus images is required to track blood vessel changes occurring with many pathologies including retinopathy and cardiovascular pathologies. One of the clinical measures that quantifies vessel changes is the arterio-venous ratio (AVR) which represents the ratio between artery and vein diameters. This measure significantly depends on the accuracy of vessel segmentation and classification into arteries and veins. This paper proposes a fast, novel method for semantic A/V segmentation combining deep learning and graph propagation. Methods A convolutional neural network (CNN) is proposed to jointly segment and classify vessels into arteries and veins. The initial CNN labeling is propagated through a graph representation of the retinal vasculature, whose nodes are defined as the vessel branches and edges are weighted by the cost of linking pairs of branches. To efficiently propagate the labels, the graph is simplified into its minimum spanning tree. Results The method achieves an accuracy of 94.8% for vessels segmentation. The A/V classification achieves a specificity of 92.9% with a sensitivity of 93.7% on the CT-DRIVE database compared to the state-of-the-art-specificity and sensitivity, both of 91.7%. Conclusion The results show that our method outperforms the leading previous works on a public dataset for A/V classification and is by far the fastest. Significance The proposed global AVR calculated on the whole fundus image using our automatic A/V segmentation method can better track vessel changes associated to diabetic retinopathy than the standard local AVR calculated only around the optic disc.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01330

PDF

http://arxiv.org/pdf/1903.01330
Read All
Computationally Efficient Deep Neural Network for Computed Tomography Image Reconstruction

2019-03-04

Dufan Wu, Kyungsang Kim, Quanzheng Li

arXiv_CV

arXiv_CV Sparse CNN Gradient_Descent
Abstract

Deep-neural-network-based image reconstruction has demonstrated promising performance in medical imaging for under-sampled and low-dose scenarios. However, it requires large amount of memory and extensive time for the training. It is especially challenging to train the reconstruction networks for three-dimensional computed tomography (CT) because of the high resolution of CT images. The purpose of this work is to reduce the memory and time consumption of the training of the reconstruction networks for CT to make it practical for current hardware, while maintaining the quality of the reconstructed images. We unrolled the proximal gradient descent algorithm for iterative image reconstruction to finite iterations and replaced the terms related to the penalty function with trainable convolutional neural networks (CNN). The network was trained greedily iteration by iteration in the image-domain on patches, which requires reasonable amount of memory and time on mainstream graphics processing unit (GPU). To overcome the local-minimum problem caused by greedy learning, we used deep UNet as the CNN and incorporated separable quadratic surrogate with ordered subsets for data fidelity, so that the solution could escape from easy local minimums and achieve better image quality. The proposed method achieved comparable image quality with state-of-the-art neural network for CT image reconstruction on 2D sparse-view and limited-angle problems on the low-dose CT challenge dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.03999

PDF

http://arxiv.org/pdf/1810.03999
Read All
Domain Adaptation in Robot Fault Diagnostic Systems

2019-03-04

Arash Golibagh Mahyari

arXiv_AI

arXiv_AI Knowledge Attention Transfer_Learning
Abstract

Industrial robots play an important role in manufacturing process. Since robots are usually set up in parallel-serial settings, breakdown of a single robot has a negative effect on the entire manufacturing process in that it slows down the process. Therefore, fault diagnostic systems based on the internal signals of robots have gained a lot of attention as essential components of the services provided for industrial robots. The current work in fault diagnostic algorithms extract features from the internal signals of the robot while the robot is healthy in order to build a model representing the normal robot behavior. During the test, the extracted features are compared to the normal behavior for detecting any deviation. The main challenge with the existing fault diagnostic algorithms is that when the task of the robot changes, the extracted features differ from those of the normal behavior. As a result, the algorithm raises false alarm. To eliminate the false alarm, fault diagnostic algorithms require the model to be retrained with normal data of the new task. In this paper, domain adaptation, {\it a.k.a} transfer learning, is used to transfer the knowledge of the trained model from one task to another in order to prevent the need for retraining and to eliminate the false alarm. The results of the proposed algorithm on real dataset show the ability of the domain adaptation in distinguishing the operation change from the mechanical condition change.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08626

PDF

http://arxiv.org/pdf/1809.08626
Read All
This Far, No Further: Introducing Virtual Borders to Mobile Robots Using a Laser Pointer

2019-03-04

Dennis Sprute, Klaus Tönnies, Matthias König

arXiv_RO

arXiv_RO
Abstract

We address the problem of controlling the workspace of a 3-DoF mobile robot. In a human-robot shared space, robots should navigate in a human-acceptable way according to the users’ demands. For this purpose, we employ virtual borders, that are non-physical borders, to allow a user the restriction of the robot’s workspace. To this end, we propose an interaction method based on a laser pointer to intuitively define virtual borders. This interaction method uses a previously developed framework based on robot guidance to change the robot’s navigational behavior. Furthermore, we extend this framework to increase the flexibility by considering different types of virtual borders, i.e. polygons and curves separating an area. We evaluated our method with 15 non-expert users concerning correctness, accuracy and teaching time. The experimental results revealed a high accuracy and linear teaching time with respect to the border length while correctly incorporating the borders into the robot’s navigational map. Finally, our user study showed that non-expert users can employ our interaction method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1708.06274

PDF

http://arxiv.org/pdf/1708.06274
Read All
Learning to Update for Object Tracking with Recurrent Meta-learner

2019-03-04

Bi Li, Wenxuan Xie, Wenjun Zeng, Wenyu Liu

arXiv_CV

arXiv_CV Tracking Object_Tracking RNN Gradient_Descent Relation
Abstract

Model update lies at the heart of object tracking. Generally, model update is formulated as an online learning problem where a target model is learned over the online training set. Our key innovation is to \emph{formulate the model update problem in the meta-learning framework and learn the online learning algorithm itself using large numbers of offline videos}, i.e., \emph{learning to update}. The learned updater takes as input the online training set and outputs an updated target model. As a first attempt, we design the learned updater based on recurrent neural networks (RNNs) and demonstrate its application in a template-based tracker and a correlation filter-based tracker. Our learned updater consistently improves the base trackers and runs faster than realtime on GPU while requiring small memory footprint during testing. Experiments on standard benchmarks demonstrate that our learned updater outperforms commonly used update baselines including the efficient exponential moving average (EMA)-based update and the well-designed stochastic gradient descent (SGD)-based update. Equipped with our learned updater, the template-based tracker achieves state-of-the-art performance among realtime trackers on GPU.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.07078

PDF

http://arxiv.org/pdf/1806.07078
Read All
Traditional Machine Learning for Pitch Detection

2019-03-04

Thomas Drugman, Goeric Huybrechts, Viacheslav Klimkov, Alexis Moinet

arXiv_CL

arXiv_CL Tracking Classification Deep_Learning Detection
Abstract

Pitch detection is a fundamental problem in speech processing as F0 is used in a large number of applications. Recent articles have proposed deep learning for robust pitch tracking. In this paper, we consider voicing detection as a classification problem and F0 contour estimation as a regression problem. For both tasks, acoustic features from multiple domains and traditional machine learning methods are used. The discrimination power of existing and proposed features is assessed through mutual information. Multiple supervised and unsupervised approaches are compared. A significant relative reduction of voicing errors over the best baseline is obtained: 20% with the best clustering method (K-means) and 45% with a Multi-Layer Perceptron. For F0 contour estimation, the benefits of regression techniques are limited though. We investigate whether those objective gains translate in a parametric synthesis task. Clear perceptual preferences are observed for the proposed approach over two widely-used baselines (RAPT and DIO).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01290

PDF

http://arxiv.org/pdf/1903.01290
Read All
Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings

2019-03-04

Gerhard Wohlgenannt, Ekaterina Chernyak, Dmitry Ilvovsky, Ariadna Barinova, Dmitry Mouromtsev

arXiv_AI

arXiv_AI Relation_Extraction Embedding Language_Model Relation
Abstract

In this research, we manually create high-quality datasets in the digital humanities domain for the evaluation of language models, specifically word embedding models. The first step comprises the creation of unigram and n-gram datasets for two fantasy novel book series for two task types each, analogy and doesn’t-match. This is followed by the training of models on the two book series with various popular word embedding model types such as word2vec, GloVe, fastText, or LexVec. Finally, we evaluate the suitability of word embedding models for such specific relation extraction tasks in a situation of comparably small corpus sizes. In the evaluations, we also investigate and analyze particular aspects such as the impact of corpus term frequencies and task difficulty on accuracy. The datasets, and the underlying system and word embedding models are available on github and can be easily extended with new datasets and tasks, be used to reproduce the presented results, or be transferred to other domains.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01284

PDF

http://arxiv.org/pdf/1903.01284
Read All
Using Word Embeddings for Visual Data Exploration with Ontodia and Wikidata

2019-03-04

Gerhard Wohlgenannt, Nikolay Klimov, Dmitry Mouromtsev, Daniil Razdyakonov, Dmitry Pavlov, Yury Emelyanov

arXiv_CL

arXiv_CL Face Embedding
Abstract

One of the big challenges in Linked Data consumption is to create visual and natural language interfaces to the data usable for non-technical users. Ontodia provides support for diagrammatic data exploration, showcased in this publication in combination with the Wikidata dataset. We present improvements to the natural language interface regarding exploring and querying Linked Data entities. The method uses models of distributional semantics to find and rank entity properties related to user input in Ontodia. Various word embedding types and model settings are evaluated, and the results show that user experience in visual data exploration benefits from the proposed approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01275

PDF

http://arxiv.org/pdf/1903.01275
Read All
Using Causal Analysis to Learn Specifications from Task Demonstrations

2019-03-04

Daniel Angelov, Yordan Hristov, Subramanian Ramamoorthy

arXiv_AI

arXiv_AI Relation
Abstract

Learning models of user behaviour is an important problem that is broadly applicable across many application domains requiring human-robot interaction. In this work we show that it is possible to learn a generative model for distinct user behavioral types, extracted from human demonstrations, by enforcing clustering of preferred task solutions within the latent space. We use this model to differentiate between user types and to find cases with overlapping solutions. Moreover, we can alter an initially guessed solution to satisfy the preferences that constitute a particular user type by backpropagating through the learned differentiable model. An advantage of structuring generative models in this way is that it allows us to extract causal relationships between symbols that might form part of the user’s specification of the task, as manifested in the demonstrations. We show that the proposed method is capable of correctly distinguishing between three user types, who differ in degrees of cautiousness in their motion, while performing the task of moving objects with a kinesthetically driven robot in a tabletop environment. Our method successfully identifies the correct type, within the specified time, in 99% [97.8 - 99.8] of the cases, which outperforms an IRL baseline. We also show that our proposed method correctly changes a default trajectory to one satisfying a particular user specification even with unseen objects. The resulting trajectory is shown to be directly implementable on a PR2 humanoid robot completing the same task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01267

PDF

http://arxiv.org/pdf/1903.01267
Read All
Towards Structured Evaluation of Deep Neural Network Supervisors

2019-03-04

Jens Henriksson, Christian Berger, Markus Borg, Lars Tornberg, Cristofer Englund, Sankar Raman Sathyamoorthy, Stig Ursing

arXiv_AI

arXiv_AI Knowledge GAN Prediction
Abstract

Deep Neural Networks (DNN) have improved the quality of several non-safety related products in the past years. However, before DNNs should be deployed to safety-critical applications, their robustness needs to be systematically analyzed. A common challenge for DNNs occurs when input is dissimilar to the training set, which might lead to high confidence predictions despite proper knowledge of the input. Several previous studies have proposed to complement DNNs with a supervisor that detects when inputs are outside the scope of the network. Most of these supervisors, however, are developed and tested for a selected scenario using a specific performance metric. In this work, we emphasize the need to assess and compare the performance of supervisors in a structured way. We present a framework constituted by four datasets organized in six test cases combined with seven evaluation metrics. The test cases provide varying complexity and include data from publicly available sources as well as a novel dataset consisting of images from simulated driving scenarios. The latter we plan to make publicly available. Our framework can be used to support DNN supervisor evaluation, which in turn could be used to motive development, validation, and deployment of DNNs in safety-critical applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01263

PDF

http://arxiv.org/pdf/1903.01263
Read All
Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings

2019-03-04

Gerhard Wohlgenannt, Artemii Babushkin, Denis Romashov, Igor Ukrainets, Anton Maskaykin, Ilya Shutov

arXiv_CL

arXiv_CL Embedding Language_Model
Abstract

In this paper, we present Russian language datasets in the digital humanities domain for the evaluation of word embedding techniques or similar language modeling and feature learning algorithms. The datasets are split into two task types, word intrusion and word analogy, and contain 31362 task units in total. The characteristics of the tasks and datasets are that they build upon small, domain-specific corpora, and that the datasets contain a high number of named entities. The datasets were created manually for two fantasy novel book series (“A Song of Ice and Fire” and “Harry Potter”). We provide baseline evaluations with popular word embedding models trained on the book corpora for the given tasks, both for the Russian and English language versions of the datasets. Finally, we compare and analyze the results and discuss specifics of Russian language with regards to the problem setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08739

PDF

http://arxiv.org/pdf/1903.08739
Read All
Semi-Supervised Brain Lesion Segmentation with an Adapted Mean Teacher Model

2019-03-04

Wenhui Cui, Yanlin Liu, Yuxing Li, Menghao Guo, Yiming Li, Xiuli Li, Tianle Wang, Xiangzhu Zeng, Chuyang Ye

arXiv_CV

arXiv_CV Segmentation CNN Image_Classification Classification
Abstract

Automated brain lesion segmentation provides valuable information for the analysis and intervention of patients. In particular, methods based on convolutional neural networks (CNNs) have achieved state-of-the-art segmentation performance. However, CNNs usually require a decent amount of annotated data, which may be costly and time-consuming to obtain. Since unannotated data is generally abundant, it is desirable to use unannotated data to improve the segmentation performance for CNNs when limited annotated data is available. In this work, we propose a semi-supervised learning (SSL) approach to brain lesion segmentation, where unannotated data is incorporated into the training of CNNs. We adapt the mean teacher model, which is originally developed for SSL-based image classification, for brain lesion segmentation. Assuming that the network should produce consistent outputs for similar inputs, a loss of segmentation consistency is designed and integrated into a self-ensembling framework. Specifically, we build a student model and a teacher model, which share the same CNN architecture for segmentation. The student and teacher models are updated alternately. At each step, the student model learns from the teacher model by minimizing the weighted sum of the segmentation loss computed from annotated data and the segmentation consistency loss between the teacher and student models computed from unannotated data. Then, the teacher model is updated by combining the updated student model with the historical information of teacher models using an exponential moving average strategy. For demonstration, the proposed approach was evaluated on ischemic stroke lesion segmentation, where it improves stroke lesion segmentation with the incorporation of unannotated data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01248

PDF

http://arxiv.org/pdf/1903.01248
Read All
Attention-based Lane Change Prediction

2019-03-04

Oliver Scheel, Naveen Shankar Nagaraja, Loren Schwarz, Nassir Navab, Federico Tombari

arXiv_AI

arXiv_AI Attention Prediction
Abstract

Lane change prediction of surrounding vehicles is a key building block of path planning. The focus has been on increasing the accuracy of prediction by posing it purely as a function estimation problem at the cost of model understandability. However, the efficacy of any lane change prediction model can be improved when both corner and failure cases are humanly understandable. We propose an attention-based recurrent model to tackle both understandability and prediction quality. We also propose metrics which reflect the discomfort felt by the driver. We show encouraging results on a publicly available dataset and proprietary fleet data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01246

PDF

http://arxiv.org/pdf/1903.01246
Read All
Improving Task-Parameterised Movement Learning Generalisation with Frame-Weighted Trajectory Generation

2019-03-04

Aran Sena, Brendan Michael, Matthew Howard

arXiv_RO

arXiv_RO
Abstract

Learning from Demonstration depends on a robot learner generalising its learned model to unseen conditions, as it is not feasible for a person to provide a demonstration set that accounts for all possible variations in non-trivial tasks. While there are many learning methods that can handle interpolation of observed data effectively, extrapolation from observed data offers a much greater challenge. To address this problem of generalisation, this paper proposes a modified Task-Parameterised Gaussian Mixture Regression method that considers the relevance of task parameters during trajectory generation, as determined by variance in the data. The benefits of the proposed method are first explored using a simulated reaching task data set. Here it is shown that the proposed method offers far-reaching, low-error extrapolation abilities that are different in nature to existing learning methods. Data collected from novice users for a real-world manipulation task is then considered, where it is shown that the proposed method is able to effectively reduce grasping performance errors by ${\sim30\%}$ and extrapolate to unseen grasp targets under real-world conditions. These results indicate the proposed method serves to benefit novice users by placing less reliance on the user to provide high quality demonstration data sets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01240

PDF

http://arxiv.org/pdf/1903.01240
Read All
Understanding the Mechanism of Deep Learning Framework for Lesion Detection in Pathological Images with Breast Cancer

2019-03-04

Wei-Wen Hsu, Chung-Hao Chen, Chang Hoa, Yu-Ling Hou, Xiang Gao, Yun Shao, Xueli Zhang, Jingjing Wang, Tao He, Yanghong Tai

arXiv_CV

arXiv_CV Knowledge Classification Deep_Learning Detection Relation
Abstract

The computer-aided detection (CADe) systems are developed to assist pathologists in slide assessment, increasing diagnosis efficiency and reducing missing inspections. Many studies have shown such a CADe system with deep learning approaches outperforms the one using conventional methods that rely on hand-crafted features based on field-knowledge. However, most developers who adopted deep learning models directly focused on the efficacy of outcomes, without providing comprehensive explanations on why their proposed frameworks can work effectively. In this study, we designed four experiments to verify the consecutive concepts, showing that the deep features learned from pathological patches are interpretable by domain knowledge of pathology and enlightening for clinical diagnosis in the task of lesion detection. The experimental results show the activation features work as morphological descriptors for specific cells or tissues, which agree with the clinical rules in classification. That is, the deep learning framework not only detects the distribution of tumor cells but also recognizes lymphocytes, collagen fibers, and some other non-cell structural tissues. Most of the characteristics learned by the deep learning models have summarized the detection rules that can be recognized by the experienced pathologists, whereas there are still some features may not be intuitive to domain experts but discriminative in classification for machines. Those features are worthy to be further studied in order to find out the reasonable correlations to pathological knowledge, from which pathological experts may draw inspirations for exploring new characteristics in diagnosis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01214

PDF

http://arxiv.org/pdf/1903.01214
Read All
Unsupervised Domain Adaptation Learning Algorithm for RGB-D Staircase Recognition

2019-03-04

Wang Jing, Zhang Kuangen

arXiv_CV

arXiv_CV Knowledge CNN Classification Detection Recognition
Abstract

Detection and recognition of staircase as upstairs, downstairs and negative (e.g., ladder) are the fundamental of assisting the visually impaired to travel independently in unfamiliar environments. Previous researches have focused on using massive amounts of RGB-D scene data to train traditional machine learning (ML) based models to detect and recognize the staircase. However, the performance of traditional ML techniques is limited by the amount of labeled RGB-D staircase data. In this paper, we apply an unsupervised domain adaptation approach in deep architectures to transfer knowledge learned from the labeled RGB-D stationary staircase dataset to the unlabeled RGB-D escalator dataset. By utilizing the domain adaptation method, our feedforward convolutional neural networks (CNN) based feature extractor with 5 convolution layers can achieve 100% classification accuracy on testing the labeled stationary staircase data and 80.6% classification accuracy on testing the unlabeled escalator data. We demonstrate the success of the approach for classifying staircase on two domains with a limited amount of data. To further demonstrate the effectiveness of the approach, we also validate the same CNN model without domain adaptation and compare its results with those of our proposed architecture.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01212

PDF

http://arxiv.org/pdf/1903.01212
Read All
Changing the Image Memorability: From Basic Photo Editing to GANs

2019-03-04

Oleksii Sidorov

arXiv_CV

arXiv_CV GAN Prediction
Abstract

Memorability is considered to be an important characteristic of visual content, whereas for advertisement and educational purposes it is the most important one. Despite numerous studies on understanding and predicting image memorability, there are almost no achievements in memorability modification. In this work, we study two possible approaches to image modification which likely may influence memorability. The visual features which influence memorability directly stay unknown till now, hence it is impossible to control it manually. As a solution, we let GAN learn it deeply using labeled data, and then use it for conditional generation of new images. By analogy with algorithms which edit facial attributes, we consider memorability as yet another attribute and operate with it in the same way. Obtained data is also interesting for analysis, simply because there are no real-world examples of successful change of image memorability while preserving its other attributes. We believe this may give many new answers to the question “what makes an image memorable?” Apart from that we also study the influence of conventional photo-editing tools (Photoshop, Instagram, etc.) used daily by a wide audience on memorability. In this case, we start from real practical methods and study it using statistics and recent advances in memorability prediction. Photographers, designers, and advertisers will benefit from the results of this study directly.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.03825

PDF

https://arxiv.org/pdf/1811.03825
Read All
Collaborative Spatio-temporal Feature Learning for Video Action Recognition

2019-03-04

Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu

arXiv_CV

arXiv_CV Action_Recognition Recognition
Abstract

Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel neural operation which encodes spatio-temporal features collaboratively by imposing a weight-sharing constraint on the learnable parameters. In particular, we perform 2D convolution along three orthogonal views of volumetric video data,which learns spatial appearance and temporal motion cues respectively. By sharing the convolution kernels of different views, spatial and temporal features are collaboratively learned and thus benefit from each other. The complementary features are subsequently fused by a weighted summation whose coefficients are learned end-to-end. Our approach achieves state-of-the-art performance on large-scale benchmarks and won the 1st place in the Moments in Time Challenge 2018. Moreover, based on the learned coefficients of different views, we are able to quantify the contributions of spatial and temporal features. This analysis sheds light on interpretability of the model and may also guide the future design of algorithm for video recognition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01197

PDF

http://arxiv.org/pdf/1903.01197
Read All
Local Distance Restricted Bribery in Voting

2019-03-04

Palash Dey

arXiv_AI

arXiv_AI
Abstract

Studying complexity of various bribery problems has been one of the main research focus in computational social choice. In all the models of bribery studied so far, the briber has to pay every voter some amount of money depending on what the briber wants the voter to report and the briber has some budget at her disposal. Although these models successfully capture many real world applications, in many other scenarios, the voters may be unwilling to deviate too much from their true preferences. In this paper, we study the computational complexity of the problem of finding a preference profile which is as close to the true preference profile as possible and still achieves the briber’s goal subject to budget constraints. We call this problem Optimal Bribery. We consider three important measures of distances, namely, swap distance, footrule distance, and maximum displacement distance, and resolve the complexity of the optimal bribery problem for many common voting rules. We show that the problem is polynomial time solvable for the plurality and veto voting rules for all the three measures of distance. On the other hand, we prove that the problem is NP-complete for a class of scoring rules which includes the Borda voting rule, maximin, Copeland$^\alpha$ for any $\alpha\in[0,1]$, and Bucklin voting rules for all the three measures of distance even when the distance allowed per voter is $1$ for the swap and maximum displacement distances and $2$ for the footrule distance even without the budget constraints (which corresponds to having an infinite budget). For the $k$-approval voting rule for any constant $k>1$ and the simplified Bucklin voting rule, we show that the problem is NP-complete for the swap distance even when the distance allowed is $2$ and for the footrule distance even when the distance allowed is $4$ even without the budget constraints.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.08711

PDF

http://arxiv.org/pdf/1901.08711
Read All
STEFANN: Scene Text Editor using Font Adaptive Neural Network

2019-03-04

Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal

arXiv_CV

arXiv_CV Knowledge
Abstract

Textual information in a captured scene play important role in scene interpretation and decision making. Pieces of dedicated research work are going on to detect and recognize textual data accurately in images. Though there exist methods that can successfully detect complex text regions present in a scene, to the best of our knowledge there is no work to modify the textual information in an image. This paper deals with a simple text editor that can edit/modify the textual part in an image. Apart from error correction in the text part of the image, this work can directly increase the reusability of images drastically. In this work, at first, we focus on the problem to generate unobserved characters with the similar font and color of an observed text character present in a natural scene with minimum user intervention. To generate the characters, we propose a multi-input neural network that adapts the font-characteristics of a given characters (source), and generate desired characters (target) with similar font features. We also propose a network that transfers color from source to target character without any visible distortion. Next, we place the generated character in a word for its modification maintaining the visual consistency with the other characters in the word. The proposed method is a unified platform that can work like a simple text editor and edit texts in images. We tested our methodology on popular ICDAR 2011 and ICDAR 2013 datasets and results are reported here.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01192

PDF

http://arxiv.org/pdf/1903.01192
Read All
Adapting Everyday Manipulation Skills to Varied Scenarios

2019-03-04

Pawel Gajewski, Paulo Ferreira, Georg Bartels, Chaozheng Wang, Frank Guerin, Bipin Indurkhya, Michael Beetz, Bartlomiej Sniezynski

arXiv_RO

arXiv_RO Knowledge
Abstract

We address the problem of executing tool-using manipulation skills in scenarios where the objects to be used may vary. We assume that point clouds of the tool and target object can be obtained, but no interpretation or further knowledge about these objects is provided. The system must interpret the point clouds and decide how to use the tool to complete a manipulation task with a target object; this means it must adjust motion trajectories appropriately to complete the task. We tackle three everyday manipulations: scraping material from a tool into a container, cutting, and scooping from a container. Our solution encodes these manipulation skills in a generic way, with parameters that can be filled in at run-time via queries to a robot perception module; the perception module abstracts the functional parts for the tool and extracts key parameters that are needed for the task. The approach is evaluated in simulation and with selected examples on a PR2 robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.02743

PDF

http://arxiv.org/pdf/1803.02743
Read All
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

2019-03-04

Gao Peng, Zhengkai Jiang, Haoxuan You, Zhengkai Jiang, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li

arXiv_CV

arXiv_CV QA Attention VQA
Abstract

Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.05252

PDF

https://arxiv.org/pdf/1812.05252
Read All
Complement Objective Training

2019-03-04

Hao-Yun Chen, Pei-Hsin Wang, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan

arXiv_CV

arXiv_CV Adversarial Classification
Abstract

Learning with a primary objective, such as softmax cross entropy for classification and sequence generation, has been the norm for training deep neural networks for years. Although being a widely-adopted approach, using cross entropy as the primary objective exploits mostly the information from the ground-truth class for maximizing data likelihood, and largely ignores information from the complement (incorrect) classes. We argue that, in addition to the primary objective, training also using a complement objective that leverages information from the complement classes can be effective in improving model performance. This motivates us to study a new training paradigm that maximizes the likelihood of the groundtruth class while neutralizing the probabilities of the complement classes. We conduct extensive experiments on multiple tasks ranging from computer vision to natural language understanding. The experimental results confirm that, compared to the conventional training with just one primary objective, training also with the complement objective further improves the performance of the state-of-the-art models across all tasks. In addition to the accuracy improvement, we also show that models trained with both primary and complement objectives are more robust to single-step adversarial attacks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01182

PDF

http://arxiv.org/pdf/1903.01182
Read All
Evolving Spiking Neural Networks for Nonlinear Control Problems

2019-03-04

Huanneng Qiu, Matthew Garratt, David Howard, Sreenatha Anavatti

arXiv_RO

arXiv_RO
Abstract

Spiking Neural Networks are powerful computational modelling tools that have attracted much interest because of the bioinspired modelling of synaptic interactions between neurons. Most of the research employing spiking neurons has been non-behavioural and discontinuous. Comparatively, this paper presents a recurrent spiking controller that is capable of solving nonlinear control problems in continuous domains using a popular topology evolution algorithm as the learning mechanism. We propose two mechanisms necessary to the decoding of continuous signals from discrete spike transmission: (i) a background current component to maintain frequency sufficiency for spike rate decoding, and (ii) a general network structure that derives strength from topology evolution. We demonstrate that the proposed spiking controller can learn significantly faster to discover functional solutions than sigmoidal neural networks in solving a classic nonlinear control problem.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01180

PDF

http://arxiv.org/pdf/1903.01180
Read All
PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things

2019-03-04

Gaku Narita, Takashi Seno, Tomoya Ishikawa, Yohsuke Kaji

arXiv_CV

arXiv_CV Regularization Segmentation Inference
Abstract

We propose PanopticFusion, a novel online volumetric semantic mapping system at the level of stuff and things. In contrast to previous semantic mapping systems, PanopticFusion is able to densely predict class labels of a background region (stuff) and individually segment arbitrary foreground objects (things). In addition, our system has the capability to reconstruct a large-scale scene and extract a labeled mesh thanks to its use of a spatially hashed volumetric map representation. Our system first predicts pixel-wise panoptic labels (class labels for stuff regions and instance IDs for thing regions) for incoming RGB frames by fusing 2D semantic and instance segmentation outputs. The predicted panoptic labels are integrated into the volumetric map together with depth measurements while keeping the consistency of the instance IDs, which could vary frame to frame, by referring to the 3D map at that moment. In addition, we construct a fully connected conditional random field (CRF) model with respect to panoptic labels for map regularization. For online CRF inference, we propose a novel unary potential approximation and a map division strategy. We evaluated the performance of our system on the ScanNet (v2) dataset. PanopticFusion outperformed or compared with state-of-the-art offline 3D DNN methods in both semantic and instance segmentation benchmarks. Also, we demonstrate a promising augmented reality application using a 3D panoptic map generated by the proposed system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01177

PDF

http://arxiv.org/pdf/1903.01177
Read All
Night-to-Day Image Translation for Retrieval-based Localization

2019-03-04

Asha Anoosheh, Torsten Sattler, Radu Timofte, Marc Pollefeys, Luc Van Gool

arXiv_CV

arXiv_CV Image_Retrieval GAN
Abstract

Visual localization is a key step in many robotics pipelines, allowing the robot to (approximately) determine its position and orientation in the world. An efficient and scalable approach to visual localization is to use image retrieval techniques. These approaches identify the image most similar to a query photo in a database of geo-tagged images and approximate the query’s pose via the pose of the retrieved database image. However, image retrieval across drastically different illumination conditions, e.g. day and night, is still a problem with unsatisfactory results, even in this age of powerful neural models. This is due to a lack of a suitably diverse dataset with true correspondences to perform end-to-end learning. A recent class of neural models allows for realistic translation of images among visual domains with relatively little training data and, most importantly, without ground-truth pairings. In this paper, we explore the task of accurately localizing images captured from two traversals of the same area in both day and night. We propose ToDayGAN - a modified image-translation model to alter nighttime driving images to a more useful daytime representation. We then compare the daytime and translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image. Our approach improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.09767

PDF

http://arxiv.org/pdf/1809.09767
Read All
Attacking Power Indices by Manipulating Player Reliability

2019-03-04

Gabriel Istrate, Cosmin Bonchiş, Alin Brînduşescu

arXiv_AI

arXiv_AI
Abstract

We investigate the manipulation of power indices in TU-cooperative games by stimulating (subject to a budget constraint) changes in the propensity of other players to participate to the game. We display several algorithms that show that the problem is often tractable for so-called network centrality games and influence attribution games, as well as an example when optimal manipulation is intractable, even though computing power indices is feasible.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01165

PDF

http://arxiv.org/pdf/1903.01165
Read All
Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

2019-03-04

Frederik Bous, Axel Roebel

arXiv_SD

arXiv_SD Deep_Learning Prediction
Abstract

We conduct an investigation on various hyper-parameters regarding neural networks used to generate spectral envelopes for singing synthesis. Two perceptive tests, where the first compares two models directly and the other ranks models with a mean opinion score, are performed. With these tests we show that when learning to predict spectral envelopes, 2d-convolutions are superior over previously proposed 1d-convolutions and that predicting multiple frames in an iterated fashion during training is superior over injecting noise to the input data. An experimental investigation whether learning to predict a probability distribution vs.\ single samples was performed but turned out to be inconclusive. A network architecture is proposed that incorporates the improvements which we found to be useful and we show in our experiments that this network produces better results than other stat-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01161

PDF

http://arxiv.org/pdf/1903.01161
Read All
Learning STRIPS Action Models with Classical Planning

2019-03-04

Diego Aineto, Sergio Jiménez, Eva Onaindia

arXiv_AI

arXiv_AI Knowledge
Abstract

This paper presents a novel approach for learning STRIPS action models from examples that compiles this inductive learning task into a classical planning task. Interestingly, the compilation approach is flexible to different amounts of available input knowledge; the learning examples can range from a set of plans (with their corresponding initial and final states) to just a pair of initial and final states (no intermediate action or state is given). Moreover, the compilation accepts partially specified action models and it can be used to validate whether the observation of a plan execution follows a given STRIPS action model, even if this model is not fully specified.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01153

PDF

http://arxiv.org/pdf/1903.01153
Read All
Hyperspectral Image Classification with Deep Metric Learning and Conditional Random Field

2019-03-04

Yi Liang, Xin Zhao, Alan J.X. Guo, Fei Zhu

arXiv_CV

arXiv_CV Segmentation GAN Image_Classification Classification Prediction
Abstract

To improve the classification performance in the context of hyperspectral image processing, many works have been developed based on two common strategies, namely the spatial-spectral information integration and the utilization of neural networks. However, both strategies typically require more training data than the classical algorithms, aggregating the shortage of labeled samples. In this paper, we propose a novel framework that organically combines an existing spectrum-based deep metric learning model and the conditional random field algorithm. The deep metric learning model is supervised by center loss, and is used to produce spectrum-based features that gather more tightly within classes in Euclidean space. The conditional random field with Gaussian edge potentials, which is firstly proposed for image segmentation problem, is utilized to jointly account for both the geometry distance of two pixels and the Euclidean distance between their corresponding features extracted by the deep metric learning model. The final predictions are given by the conditional random field. Generally, the proposed framework is trained by spectra pixels at the deep metric learning stage, and utilizes the half handcrafted spatial features at the conditional random field stage. This settlement alleviates the shortage of training data to some extent. Experiments on two real hyperspectral images demonstrate the advantages of the proposed method in terms of both classification accuracy and computation cost.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06258

PDF

http://arxiv.org/pdf/1903.06258
Read All
Trajectory Replanning for Quadrotors Using Kinodynamic Search and Elastic Optimization

2019-03-04

Wenchao Ding, Wenliang Gao, Kaixuan Wang, Shaojie Shen

arXiv_RO

arXiv_RO Optimization
Abstract

We focus on a replanning scenario for quadrotors where considering time efficiency, non-static initial state and dynamical feasibility is of great significance. We propose a real-time B-spline based kinodynamic (RBK) search algorithm, which transforms a position-only shortest path search (such as A* and Dijkstra) into an efficient kinodynamic search, by exploring the properties of B-spline parameterization. The RBK search is greedy and produces a dynamically feasible time-parameterized trajectory efficiently, which facilitates non-static initial state of the quadrotor. To cope with the limitation of the greedy search and the discretization induced by a grid structure, we adopt an elastic optimization (EO) approach as a post-optimization process, to refine the control point placement provided by the RBK search. The EO approach finds the optimal control point placement inside an expanded elastic tube which represents the free space, by solving a Quadratically Constrained Quadratic Programming (QCQP) problem. We design a receding horizon replanner based on the local control property of B-spline. A systematic comparison of our method against two state-of-the-art methods is provided. We integrate our replanning system with a monocular vision-based quadrotor and validate our performance onboard.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01139

PDF

http://arxiv.org/pdf/1903.01139
Read All
Multinomial Logit Bandit with Linear Utility Functions

2019-03-04

Mingdong Ou, Nan Li, Shenghuo Zhu, Rong Jin

arXiv_AI

arXiv_AI Face
Abstract

Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a $K$-cardinality subset from $N$ candidate items, and receives a reward which is governed by a {\it multinomial logit} (MNL) choice model considering both item utility and substitution property among items. The player’s objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon $T$. This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model separately and incur a regret no better than $\tilde{O}\big(\sqrt{NT}\big)$ which is not preferred for large candidate set size $N$. In this paper, we consider the {\it linear utility} MNL choice model whose item utilities are represented as linear functions of $d$-dimension item features, and propose an algorithm, titled {\bf LUMB}, to exploit the underlying structure. It is proven that the proposed algorithm achieves $\tilde{O}\big(dK\sqrt{T}\big)$ regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.02971

PDF

http://arxiv.org/pdf/1805.02971
Read All
Devon: Deformable Volume Network for Learning Optical Flow

2019-03-04

Yao Lu, Jack Valmadre, Heng Wang, Juho Kannala, Mehrtash Harandi, Philip H. S. Torr

arXiv_CV

arXiv_CV
Abstract

State-of-the-art neural network models estimate large displacement optical flow in multi-resolution and use warping to propagate the estimation between two resolutions. Despite their impressive results, it is known that there are two problems with the approach. First, the multi-resolution estimation of optical flow fails in situations where small objects move fast. Second, warping creates artifacts when occlusion or dis-occlusion happens. In this paper, we propose a new neural network module, Deformable Cost Volume, which alleviates the two problems. Based on this module, we designed the Deformable Volume Network (Devon) which can estimate multi-scale optical flow in a single high resolution. Experiments show Devon is more suitable in handling small objects moving fast and achieves comparable results to the state-of-the-art methods in public benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.07351

PDF

http://arxiv.org/pdf/1802.07351
Read All
Zero-Shot Task Transfer

2019-03-04

Arghya Pal, Vineeth N Balasubramanian

arXiv_CV

arXiv_CV Knowledge Face Pose_Estimation Transfer_Learning Relation
Abstract

In this work, we present a novel meta-learning algorithm, i.e. TTNet, that regresses model parameters for novel tasks for which no ground truth is available (zero-shot tasks). In order to adapt to novel zero-shot tasks, our meta-learner learns from the model parameters of known tasks (with ground truth) and the correlation of known tasks to zero-shot tasks. Such intuition finds its foothold in cognitive science, where a subject (human baby) can adapt to a novel-concept (depth understanding) by correlating it with old concepts (hand movement or self-motion), without receiving explicit supervision. We evaluated our model on the Taskonomy dataset, with four tasks as zero-shot: surface-normal, room layout, depth, and camera pose estimation. These tasks were chosen based on the data acquisition complexity and the complexity associated with the learning process using a deep network. Our proposed methodology out-performs state-of-the-art models (which use ground truth)on each of our zero-shot tasks, showing promise on zero-shot task transfer. We also conducted extensive experiments to study the various choices of our methodology, as well as showed how the proposed method can also be used in transfer learning. To the best of our knowledge, this is the firstsuch effort on zero-shot learning in the task space.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01092

PDF

http://arxiv.org/pdf/1903.01092
Read All
Unpaired image denoising using a generative adversarial network in X-ray CT

2019-03-04

Hyoung Suk Park, Jineon Baek, Sun Kyoung You, Jae Kyu Choi, Jin Keun Seo

arXiv_CV

arXiv_CV Adversarial GAN Deep_Learning
Abstract

This paper proposes a deep learning-based denoising method for noisy low-dose computerized tomography (CT) images in the absence of paired training data. The proposed method uses a fidelity-embedded generative adversarial network (GAN) to learn a denoising function from unpaired training data of low-dose CT (LDCT) and standard-dose CT (SDCT) images, where the denoising function is the optimal generator in the GAN framework. Given an optimal discriminator in the GAN, the generator is optimized by minimizing a weighted sum of two losses: the Kullback-Leibler divergence between an SDCT data distribution and a generated distribution, and the $\ell_2$ loss between the LDCT image and the corresponding generated images (or denoised image). The experimental results show that the proposed deep-learning method with unpaired datasets performs comparably to a method using paired datasets. Clinical experiment was also performed to show the validity of the proposed method for non-Gaussian noise arising in the low-dose X-ray CT.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06257

PDF

http://arxiv.org/pdf/1903.06257
Read All

134/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL