Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Joint Entity Linking with Deep Reinforcement Learning

2019-02-01

Zheng Fang, Yanan Cao, Dongjie Zhang, Qian Li, Zhenyu Zhang, Yanbing Liu

arXiv_CL

arXiv_CL Knowledge Reinforcement_Learning
Abstract

Entity linking is the task of aligning mentions to corresponding entities in a given knowledge base. Previous studies have highlighted the necessity for entity linking systems to capture the global coherence. However, there are two common weaknesses in previous global models. First, most of them calculate the pairwise scores between all candidate entities and select the most relevant group of entities as the final result. In this process, the consistency among wrong entities as well as that among right ones are involved, which may introduce noise data and increase the model complexity. Second, the cues of previously disambiguated entities, which could contribute to the disambiguation of the subsequent mentions, are usually ignored by previous models. To address these problems, we convert the global linking into a sequence decision problem and propose a reinforcement learning model which makes decisions from a global perspective. Our model makes full use of the previous referred entities and explores the long-term influence of current selection on subsequent decisions. We conduct experiments on different types of datasets, the results show that our model outperforms state-of-the-art systems and has better generalization performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00330

PDF

http://arxiv.org/pdf/1902.00330
Read All
Multi-Task Learning with a Fully Convolutional Network for Rectum and Rectal Cancer Segmentation

2019-02-01

Joohyung Lee, Ji Eun Oh, Min Ju Kim, Bo Yun Hur, Sun Ah Cho, Dae Kyung Sohn

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

In a rectal cancer treatment planning, the location of rectum and rectal cancer plays an important role. The aim of this study is to propose a fully automatic method to segment both rectum and rectal cancer with axial T2-weighted magnetic resonance images. We present a fully convolutional network for multi-task learning to segment both rectum and rectal cancer. Moreover, we propose an assessment method based on bias-variance decomposition to visualize and measure the regional model robustness of a segmentation network. In addition, we suggest a novel augmentation method which can improve the segmentation performance and reduce the training time. Our proposed method not only is computationally efficient due to its fully convolutional nature but also outperforms the current state-of-the-art in rectal cancer segmentation. It also shows high accuracy in rectum segmentation, for which no previous studies exist. We conclude that rectum information benefits the training of rectal cancer segmentation model, especially concerning model variance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.07213

PDF

http://arxiv.org/pdf/1901.07213
Read All
Rethinking Visual Relationships for High-level Image Understanding

2019-02-01

Yuanzhi Liang, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, Tao Mei

arXiv_CV

arXiv_CV Image_Caption Caption Relation VQA
Abstract

Relationships, as the bond of isolated entities in images, reflect the interaction between objects and lead to a semantic understanding of scenes. Suffering from visually-irrelevant relationships in current scene graph datasets, the utilization of relationships for semantic tasks is difficult. The datasets widely used in scene graph generation tasks are splitted from Visual Genome by label frequency, which even can be well solved by statistical counting. To encourage further development in relationships, we propose a novel method to mine more valuable relationships by automatically filtering out visually-irrelevant relationships. Then, we construct a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) from Visual Genome. We evaluate several existing methods in scene graph generation in our dataset. The results show the performances degrade significantly compared to the previous dataset and the frequency analysis do not work on our dataset anymore. Moreover, we propose a method to learn feature representations of instances, attributes, and visual relationships jointly from images, then we apply the learned features to image captioning and visual question answering respectively. The improvements on the both tasks demonstrate the efficiency of the features with relation information and the richer semantic information provided in our dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00313

PDF

http://arxiv.org/pdf/1902.00313
Read All
Generative Smoke Removal

2019-02-01

Oleksii Sidorov, Congcong Wang, Faouzi Alaya Cheikh

arXiv_CV

arXiv_CV GAN
Abstract

In minimally invasive surgery, the use of tissue dissection tools causes smoke, which inevitably degrades the image quality. This could reduce the visibility of the operation field for surgeons and introduces errors for the computer vision algorithms used in surgical navigation systems. In this paper, we propose a novel approach for computational smoke removal using supervised image-to-image translation. We demonstrate that straightforward application of existing generative algorithms allows removing smoke but decreases image quality and introduces synthetic noise (grid-structure). Thus, we propose to solve this issue by modification of GAN’s architecture and adding perceptual image quality metric to the loss function. Obtained results demonstrate that proposed method efficiently removes smoke as well as preserves perceptually sufficient image quality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00311

PDF

http://arxiv.org/pdf/1902.00311
Read All
Deep Hyperspectral Prior: Denoising, Inpainting, Super-Resolution

2019-02-01

Oleksii Sidorov, Jon Yngve Hardeberg

arXiv_CV

arXiv_CV Super_Resolution CNN Deep_Learning
Abstract

Deep learning algorithms have demonstrated state-of-the-art performance in various tasks of image restoration. This was made possible through the ability of CNNs to learn from large exemplar sets. However, the latter becomes an issue for hyperspectral image processing where datasets commonly consist of just a few images. In this work, we propose a new approach to denoising, inpainting, and super-resolution of hyperspectral image data using intrinsic properties of a CNN without any training. The performance of the given algorithm is shown to be comparable to the performance of trained networks, while its application is not restricted by the availability of training data. This work is an extension of original “deep prior” algorithm to HSI domain and 3D-convolutional networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00301

PDF

http://arxiv.org/pdf/1902.00301
Read All
End-to-end Lane Detection through Differentiable Least-Squares Fitting

2019-02-01

Bert De Brabandere, Wouter Van Gansbeke, Davy Neven, Marc Proesmans, Luc Van Gool

arXiv_CV

arXiv_CV Object_Detection Segmentation GAN Detection
Abstract

Lane detection is typically tackled with a two-step pipeline in which a segmentation mask of the lane markings is predicted first, and a lane line model (like a parabola or spline) is fitted to the post-processed mask next. The problem with such a two-step approach is that the parameters of the network are not optimized for the true task of interest (estimating the lane curvature parameters) but for a proxy task (segmenting the lane markings), resulting in sub-optimal performance. In this work, we propose a method to train a lane detector in an end-to-end manner, directly regressing the lane parameters. The architecture consists of two components: a deep network that predicts a segmentation-like weight map for each lane line, and a differentiable least-squares fitting module that returns for each map the parameters of the best-fitting curve in the weighted least-squares sense. These parameters can subsequently be supervised with a loss function of choice. Our method relies on the observation that it is possible to backpropagate through a least-squares fitting procedure. This leads to an end-to-end method where the features are optimized for the true task of interest: the network implicitly learns to generate features that prevent instabilities during the model fitting step, as opposed to two-step pipelines that need to handle outliers with heuristics. Additionally, the system is not just a black box but offers a degree of interpretability because the intermediately generated segmentation-like weight maps can be inspected and visualized. Code and a video is available at github.com/wvangansbeke/LaneDetection_End2End.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00293

PDF

http://arxiv.org/pdf/1902.00293
Read All
Causal Simulations for Uplift Modeling

2019-02-01

Jeroen Berrevoets, Wouter Verbeke

arXiv_AI

arXiv_AI GAN Relation
Abstract

Uplift modeling requires experimental data, preferably collected in random fashion. This places a logistical and financial burden upon any organisation aspiring such models. Once deployed, uplift models are subject to effects from concept drift. Hence, methods are being developed that are able to learn from newly gained experience, as well as handle drifting environments. As these new methods attempt to eliminate the need for experimental data, another approach to test such methods must be formulated. Therefore, we propose a method to simulate environments that offer causal relationships in their parameters.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00287

PDF

http://arxiv.org/pdf/1902.00287
Read All
Flexible collaborative transportation by a team of rotorcraft

2019-02-01

Hector Garcia de Marina, Ewoud Smeur

arXiv_RO

arXiv_RO Tracking
Abstract

We propose a combined method for the collaborative transportation of a suspended payload by a team of rotorcraft. A recent distance-based formation-motion control algorithm based on assigning distance disagreements among robots generates the acceleration signals to be tracked by the vehicles. In particular, the proposed method does not need global positions nor tracking prescribed trajectories for the motion of the members of the team. The acceleration signals are followed accurately by an Incremental Nonlinear Dynamic Inversion controller designed for rotorcraft that measures and resists the tensions from the payload. Our approach allows us to analyze the involved accelerations and forces in the system so that we can calculate the worst case conditions explicitly to guarantee a nominal performance, provided that the payload starts at rest in the 2D centroid of the formation, and it is not under significant disturbances. For example, we can calculate the maximum safe deformation of the team with respect to its desired shape. We demonstrate our method with a team of four rotorcraft carrying a suspended object two times heavier than the maximum payload for an individual. Last but not least, our proposed algorithm is available for the community in the open-source autopilot Paparazzi.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00279

PDF

http://arxiv.org/pdf/1902.00279
Read All
Deep Learning Solutions for TanDEM-X-based Forest Classification

2019-02-01

Antonio Mazza, Francescopaolo Sica

arXiv_CV

arXiv_CV Object_Detection Image_Classification Classification Deep_Learning Detection
Abstract

In the last few years, deep learning (DL) has been successfully and massively employed in computer vision for discriminative tasks, such as image classification or object detection. This kind of problems are core to many remote sensing (RS) applications as well, though with domain-specific peculiarities. Therefore, there is a growing interest on the use of DL methods for RS tasks. Here, we consider the forest/non-forest classification problem with TanDEM-X data, and test two state-of-the-art DL models, suitably adapting them to the specific task. Our experiments confirm the great potential of DL methods for RS applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00274

PDF

http://arxiv.org/pdf/1902.00274
Read All
Instance Segmentation as Image Segmentation Annotation

2019-02-01

Thomio Watanabe, Denis Wolf

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

The instance segmentation problem intends to precisely detect and delineate objects in images. Most of the current solutions rely on deep convolutional neural networks but despite this fact proposed solutions are very diverse. Some solutions approach the problem as a network problem, where they use several networks or specialize a single network to solve several tasks. A different approach tries to solve the problem as an annotation problem, where the instance information is encoded in a mathematical representation. This work proposes a solution based in the DCME technique to solve the instance segmentation with a single segmentation network. Different from others, the segmentation network decoder is not specialized in a multi-task network. Instead, the network encoder is repurposed to classify image objects, reducing the computational cost of the solution.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05498

PDF

http://arxiv.org/pdf/1902.05498
Read All
Adaptive Gradient Refinement for Adversarial Perturbation Generation

2019-02-01

Yatie Xiao, Chi-Man Pun, Xia Du, Jizhe Zhou

arXiv_CV

arXiv_CV Adversarial Image_Classification Classification Prediction
Abstract

Deep Neural Networks have achieved remarkable success in computer vision, natural language processing, and audio tasks. However, in classification domains, researches proved that Deep neural models are easily fooled and make different or wrong classification prediction, which may cause server results. Many attack methods generate adversarial perturbation with large-scale pixel modification and low cosine-similarity between origin and corresponding adversarial examples, to address these issues, we propose an adversarial method with adaptive adjusting perturbation strength and update gradient in direction to generate attacks, it generate perturbation tensors by adjusting its strength adaptively and update gradient in direction which can escape local minimal or maximal by combining with previous calculate history gradient. In this paper, we evaluate several traditional perturbations creating methods in image classification with ours. Experimental results show that our approach works well and outperform recent techniques in the change of misclassifying image classification, and excellent efficiency in fooling deep network models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01220

PDF

http://arxiv.org/pdf/1902.01220
Read All
ColorNet: Investigating the importance of color spaces for image classification

2019-02-01

Shreyank N Gowda, Chun Yuan

arXiv_CV

arXiv_CV Image_Classification Classification
Abstract

Image classification is a fundamental application in computer vision. Recently, deeper networks and highly connected networks have shown state of the art performance for image classification tasks. Most datasets these days consist of a finite number of color images. These color images are taken as input in the form of RGB images and classification is done without modifying them. We explore the importance of color spaces and show that color spaces (essentially transformations of original RGB images) can significantly affect classification accuracy. Further, we show that certain classes of images are better represented in particular color spaces and for a dataset with a highly varying number of classes such as CIFAR and Imagenet, using a model that considers multiple color spaces within the same model gives excellent levels of accuracy. Also, we show that such a model, where the input is preprocessed into multiple color spaces simultaneously, needs far fewer parameters to obtain high accuracy for classification. For example, our model with 1.75M parameters significantly outperforms DenseNet 100-12 that has 12M parameters and gives results comparable to Densenet-BC-190-40 that has 25.6M parameters for classification of four competitive image classification datasets namely: CIFAR-10, CIFAR-100, SVHN and Imagenet. Our model essentially takes an RGB image as input, simultaneously converts the image into 7 different color spaces and uses these as inputs to individual densenets. We use small and wide densenets to reduce computation overhead and number of hyperparameters required. We obtain significant improvement on current state of the art results on these datasets as well.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00267

PDF

http://arxiv.org/pdf/1902.00267
Read All
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification

2019-02-01

Patrick Helber, Benjamin Bischke, Andreas Dengel, Damian Borth

arXiv_CV

arXiv_CV CNN Classification Deep_Learning
Abstract

In this paper, we address the challenge of land use and land cover classification using Sentinel-2 satellite images. The Sentinel-2 satellite images are openly and freely accessible provided in the Earth observation program Copernicus. We present a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes with in total 27,000 labeled and geo-referenced images. We provide benchmarks for this novel dataset with its spectral bands using state-of-the-art deep Convolutional Neural Network (CNNs). With the proposed novel dataset, we achieved an overall classification accuracy of 98.57%. The resulting classification system opens a gate towards a number of Earth observation applications. We demonstrate how this classification system can be used for detecting land use and land cover changes and how it can assist in improving geographical maps. The geo-referenced dataset EuroSAT is made publicly available at https://github.com/phelber/eurosat.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1709.00029

PDF

http://arxiv.org/pdf/1709.00029
Read All
Natural and Adversarial Error Detection using Invariance to Image Transformations

2019-02-01

Yuval Bahat, Michal Irani, Gregory Shakhnarovich

arXiv_CV

arXiv_CV Adversarial Object_Detection Knowledge Image_Classification Classification Detection
Abstract

We propose an approach to distinguish between correct and incorrect image classifications. Our approach can detect misclassifications which either occur $\it{unintentionally}$ (“natural errors”), or due to $\it{intentional~adversarial~attacks}$ (“adversarial errors”), both in a single $\it{unified~framework}$. Our approach is based on the observation that correctly classified images tend to exhibit robust and consistent classifications under certain image transformations (e.g., horizontal flip, small image translation, etc.). In contrast, incorrectly classified images (whether due to adversarial errors or natural errors) tend to exhibit large variations in classification results under such transformations. Our approach does not require any modifications or retraining of the classifier, hence can be applied to any pre-trained classifier. We further use state of the art targeted adversarial attacks to demonstrate that even when the adversary has full knowledge of our method, the adversarial distortion needed for bypassing our detector is $\it{no~longer~imperceptible~to~the~human~eye}$. Our approach obtains state-of-the-art results compared to previous adversarial detection methods, surpassing them by a large margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00236

PDF

http://arxiv.org/pdf/1902.00236
Read All
Achieving sub-1 Ohm-mm Non-Recess S/D Contact Resistance in GaN HEMTs Utilizing Simple CMOS Compatible La/Ti/Al/Ti Metal Contacts

2019-02-01

Xinpeng Lin, Yumeng Zhu, Yongle Qi, Guangnan Zhou, Wenmao Li, Yang Jiang, Jian Zhang, Robert Sokolovskj, Yulong Jiang, Guangrui (Maggie)Xia, Mengyuan Hua, Hongyu Yu

arXiv_CV

arXiv_CV GAN Face
Abstract

In this paper, we report the use of lanthanum (La) in S/D contacts of GaN HEMTs, achieving 0.97 Ohm-mm contact resistance without S/D recess. The HEMTs show well-behaved electrical characteristics and satisfactory reliability. Our studies show that La, a CMOS compatible metal, is promising to lower GaN HEMT S/D contact resistance. La’s low work function (3.5 eV) is beneficial for reducing the barrier between the metals and GaN. The Ohmic contact formation mechanism involved was shown to be different from conventional Ti/Al films. Spherical-shaped high-La regions formed near the surface during annealing. La diffuses into the AlGaN layer, and the overlap of La and Al peaks is significantly increased compared with that before annealing.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.00227

PDF

https://arxiv.org/pdf/1902.00227
Read All
A Classification Supervised Auto-Encoder Based on Predefined Evenly-Distributed Class Centroids

2019-02-01

Qiuyu Zhu, Ruixin Zhang

arXiv_CV

arXiv_CV Classification Gradient_Descent
Abstract

Classic Autoencoders and variational autoencoders are used to learn complex data distributions, that are built on standard function approximators, such as neural networks, which can be trained by stochastic gradient descent methods. Especially, VAE has shown promise on a lot of complex task. In this paper, a new autoencoder model - classification supervised autoencoder (CSAE) based on predefined evenly-distributed class centroids (PEDCC) is proposed. To carry out the supervised learning for autoencoder, we use PEDCC of latent variables to train the network to ensure the maximization of inter-class distance and the minimization of inner-class distance. Instead of learning mean/variance of latent variables distribution and taking reparameterization of VAE, latent variables of CSAE are directly used to classify and as input of decoder. In addition, a new loss function is proposed to combine the loss function of classification, the loss function of image codec quality and the loss function for enhancing subjective quality of decoded image. Based on the basic structure of the universal autoencoder, we realized the comprehensive optimal results of encoding, decoding and classification, and good model generalization performance at the same time. Theoretical advantages are reflected in experimental results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00220

PDF

http://arxiv.org/pdf/1902.00220
Read All
Ablation of a Robot's Brain: Neural Networks Under a Knife

2019-02-01

Peter E. Lillian, Richard Meyes, Tobias Meisen

arXiv_CV

arXiv_CV Knowledge
Abstract

It is still not fully understood exactly how neural networks are able to solve the complex tasks that have recently pushed AI research forward. We present a novel method for determining how information is structured inside a neural network. Using ablation (a neuroscience technique for cutting away parts of a brain to determine their function), we approach several neural network architectures from a biological perspective. Through an analysis of this method’s results, we examine important similarities between biological and artificial neural networks to search for the implicit knowledge locked away in the network’s weights.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.05687

PDF

https://arxiv.org/pdf/1812.05687
Read All
Geometric interpretation of the general POE model for a serial-link robot via conversion into D-H parameterization

2019-02-01

Liao Wu, Ross Crawford, Jonathan Roberts

arXiv_RO

arXiv_RO
Abstract

While Product of Exponentials (POE) formula has been gaining increasing popularity in modeling the kinematics of a serial-link robot, the Denavit-Hartenberg (D-H) notation is still the most widely used due to its intuitive and concise geometric interpretation of the robot. This paper has developed an analytical solution to automatically convert a POE model into a D-H model for a robot with revolute, prismatic, and helical joints, which are the complete set of three basic one degree of freedom lower pair joints for constructing a serial-link robot. The conversion algorithm developed can be used in applications such as calibration where it is necessary to convert the D-H model to the POE model for identification and then back to the D-H model for compensation. The equivalence of the two models proved in this paper also benefits the analysis of the identifiability of the kinematic parameters. It is found that the maximum number of identifiable parameters in a general POE model is 5h+4r +2t +n+6 where h, r, t, and n stand for the number of helical, revolute, prismatic, and general joints, respectively. It is also suggested that the identifiability of the base frame and the tool frame in the D-H model is restricted rather than the arbitrary six parameters as assumed previously.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00198

PDF

http://arxiv.org/pdf/1902.00198
Read All
Multilingual NER Transfer for Low-resource Languages

2019-02-01

Afshin Rahimi, Yuan Li, Trevor Cohn

arXiv_CL

arXiv_CL Inference Recognition
Abstract

In massively multilingual transfer NLP models over many source languages are applied to a low-resource target language. In contrast to most prior work, which use a single model or a small handful, we consider many such models, which raises the critical problem of poor transfer, particularly from distant languages. We propose two techniques for modulating the transfer: one based on unsupervised truth inference, and another using limited supervision in the target language. Evaluating on named entity recognition over 41 languages, we show that our techniques are much more effective than strong baselines, including standard ensembling, and our unsupervised method rivals oracle selection of the single best individual model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00193

PDF

http://arxiv.org/pdf/1902.00193
Read All
Thermal Recovery of Multi-Limbed Robots with Electric Actuators

2019-02-01

Steven Jens Jorgensen, James Holley, Frank Mathis, Luis Sentis

arXiv_RO

arXiv_RO
Abstract

The problem of finding thermally minimizing configurations of a humanoid robot to recover its actuators from unsafe thermal states is addressed. A first-order, data-driven, effort-based, thermal model of the robot’s actuators is devised, which is used to predict future thermal states. Given this predictive capability, a map between configurations and future temperatures is formulated to find what configurations, subject to valid contact constraints, can be taken now to minimize future thermal states. Effectively, this approach is a realization of a contact-constrained thermal inverse-kinematics (IK) process. Experimental validation of the proposed approach is performed on the NASA Valkyrie robot hardware.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00187

PDF

http://arxiv.org/pdf/1902.00187
Read All
2019-05-31

Read All
2D and 3D Vascular Structures Enhancement via Multiscale Fractional Anisotropy Tensor

2019-02-01

Haifa F. Alhasson, Shuaa S. Alharbi, Boguslaw Obara

arXiv_CV

arXiv_CV Quantitative Detection
Abstract

The detection of vascular structures from noisy images is a fundamental process for extracting meaningful information in many applications. Most well-known vascular enhancing techniques often rely on Hessian-based filters. This paper investigates the feasibility and deficiencies of detecting curve-like structures using a Hessian matrix. The main contribution is a novel enhancement function, which overcomes the deficiencies of established methods. Our approach has been evaluated quantitatively and qualitatively using synthetic examples and a wide range of real 2D and 3D biomedical images. Compared with other existing approaches, the experimental results prove that our proposed approach achieves high-quality curvilinear structure enhancement.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00550

PDF

http://arxiv.org/pdf/1902.00550
Read All
Comparison of Patch-Based Conditional Generative Adversarial Neural Net Models with Emphasis on Model Robustness for Use in Head and Neck Cases for MR-Only planning

2019-02-01

Peter Klages, Ilyes Bensilmane, Sadegh Ryahi, Jue Jiang, Margie Hunt, Joe Deasy, Harini, Veeraraghavan, Neelam Tyagi

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

A total of twenty paired CT and MR images were used in this study to investigate two conditional generative adversarial networks, Pix2Pix, and Cycle GAN, for generating synthetic CT images for Headand Neck cancer cases. Ten of the patient cases were used for training and included such common artifacts as dental implants; the remaining ten testing cases were used for testing and included a larger range of image features commonly found in clinical head and neck cases. These features included strong metal artifacts from dental implants, one case with a metal implant, and one case with abnormal anatomy. The original CT images were deformably registered to the mDixon FFE MR images to minimize the effects of processing the MR images. The sCT generation accuracy and robustness were evaluated using Mean Absolute Error (MAE) based on the Hounsfield Units (HU) for three regions (whole body, bone, and air within the body), Mean Error (ME) to observe systematic average offset errors in the sCT generation, and dosimetric evaluation of all clinically relevant structures. For the test set the MAE for the Pix2Pix and Cycle GAN models were 92.4 $\pm$ 13.5 HU, and 100.7 $\pm$ 14.6 HU, respectively, for the body region, 166.3 $\pm$ 31.8 HU, and 184 $\pm$ 31.9 HU, respectively, for the bone region, and 183.7 $\pm$ 41.3 HU and 185.4 $\pm$ 37.9 HU for the air regions. The ME for Pix2Pix and Cycle GAN were 21.0 $\pm$ 11.8 HU and 37.5 $\pm$ 14.9 HU, respectively. Absolute Percent Mean/Max Dose Errors were less than 2% for the PTV and all critical structures for both models, and DRRs generated from these models looked qualitatively similar to CT generated DRRs showing these methods are promising for MR-only planning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00536

PDF

http://arxiv.org/pdf/1902.00536
Read All
How to Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions

2019-02-01

Goran Glavas, Robert Litschko, Sebastian Ruder, Ivan Vulic

arXiv_CL

arXiv_CL Embedding
Abstract

Cross-lingual word embeddings (CLEs) enable multilingual modeling of meaning and facilitate cross-lingual transfer of NLP models. Despite their ubiquitous usage in downstream tasks, recent increasingly popular projection-based CLE models are almost exclusively evaluated on a single task only: bilingual lexicon induction (BLI). Even BLI evaluations vary greatly, hindering our ability to correctly interpret performance and properties of different CLE models. In this work, we make the first step towards a comprehensive evaluation of cross-lingual word embeddings. We thoroughly evaluate both supervised and unsupervised CLE models on a large number of language pairs in the BLI task and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP. We empirically demonstrate that the performance of CLE models largely depends on the task at hand and that optimizing CLE models for BLI can result in deteriorated downstream performance. We indicate the most robust supervised and unsupervised CLE models and emphasize the need to reassess existing baselines, which still display competitive performance across the board. We hope that our work will catalyze further work on CLE evaluation and model analysis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00508

PDF

http://arxiv.org/pdf/1902.00508
Read All
A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings

2019-02-01

Wei Yang, Wei Lu, Vincent W. Zheng

arXiv_CL

arXiv_CL Regularization Attention Embedding
Abstract

Learning word embeddings has received a significant amount of attention recently. Often, word embeddings are learned in an unsupervised manner from a large collection of text. The genre of the text typically plays an important role in the effectiveness of the resulting embeddings. How to effectively train word embedding models using data from different domains remains a problem that is underexplored. In this paper, we present a simple yet effective method for learning word embeddings based on text from different domains. We demonstrate the effectiveness of our approach through extensive experiments on various down-stream NLP tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00184

PDF

http://arxiv.org/pdf/1902.00184
Read All
Fast and Optimal Laplacian Solver for Gradient-Domain Image Editing using Green Function Convolution

2019-02-01

Dominique Beaini, Sofiane Achiche, Fabrice Nonez, Olivier Brochu Dufour, Cédric Leblond-Ménard, Mahdis Asaadi, Maxime Raison

arXiv_CV

arXiv_CV Detection
Abstract

In computer vision, the gradient and Laplacian of an image are used in many different applications, such as edge detection, feature extraction and seamless image cloning. To obtain the gradient of an image, it requires the use of numerical derivatives, which are available in most computer vision toolboxes. However, the reverse problem is more difficult, since computing an image from its gradient requires to solve the Laplacian differential equation. The problem with the current existing methods is that they provide a solution that is prone to high numerical errors, and that they are either slow or require heavy parallel computing. The objective of this paper is to present a novel fast and robust method of computing the image from its gradient or Laplacian with minimal error, which can be used for gradient-domain editing. By using a single convolution based on Green’s function, the whole process is faster and easier to implement. It can also be optimized on a GPU using fast Fourier transforms and can easily be generalized for an n-dimension image. The tests show that the gradient solver takes around 2 milliseconds (ms) to reconstruct an image of 801x1200 pixels compared to between 6ms and 3000ms for competing methods. Furthermore, it is proven mathematically that the proposed method gives the optimal result when a perturbation is added, meaning that it always produces the least-error solution for gradient-domain editing. Finally, the developed method is validated with examples of Poisson blending, gradient removal, edge preserving blurring and edge-preserving painting effect.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00176

PDF

http://arxiv.org/pdf/1902.00176
Read All
Dating Documents using Graph Convolution Networks

2019-02-01

Shikhar Vashishth, Shib Sankar Dasgupta, Swayambhu Nath Ray, Partha Talukdar

arXiv_AI

arXiv_AI Knowledge Summarization CNN Inference Deep_Learning Detection
Abstract

Document date is essential for many important tasks, such as document retrieval, summarization, event detection, etc. While existing approaches for these tasks assume accurate knowledge of the document date, this is not always available, especially for arbitrary documents from the Web. Document Dating is a challenging problem which requires inference over the temporal structure of the document. Prior document dating systems have largely relied on handcrafted features while ignoring such document internal structures. In this paper, we propose NeuralDater, a Graph Convolutional Network (GCN) based document dating approach which jointly exploits syntactic and temporal graph structures of document in a principled way. To the best of our knowledge, this is the first application of deep learning for the problem of document dating. Through extensive experiments on real-world datasets, we find that NeuralDater significantly outperforms state-of-the-art baseline by 19% absolute (45% relative) accuracy points.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00175

PDF

http://arxiv.org/pdf/1902.00175
Read All
Dataset Culling: Towards Efficient Training Of Distillation-Based Domain Specific Models

2019-02-01

Kentaro Yoshioka, Edward Lee, Simon Wong, Mark Horowitz

arXiv_CV

arXiv_CV Object_Detection Inference Detection
Abstract

Real-time CNN based object detection models for applications like surveillance can achieve high accuracy but require extensive computations. Recent work has shown 10 to 100x reduction in computation cost with domain-specific network settings. However, this prior work focused on inference only: if the domain network requires frequent retraining, training and retraining costs can be a significant bottleneck. To address training costs, we propose Dataset Culling: a pipeline to significantly reduce the required training dataset size for domain specific models. Dataset Culling reduces the dataset size by filtering out non-essential data for train-ing, and reducing the size of each image until detection degrades. Both of these operations use a confusion loss metric which enables us to execute the culling with minimal computation overhead. On a custom long-duration dataset, we show that Dataset Culling can reduce the training costs 47x with no accuracy loss or even with slight improvements. Codes are available: https://github.com/kentaroy47/DatasetCulling

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00173

PDF

http://arxiv.org/pdf/1902.00173
Read All
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension

2019-02-01

Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, Claire Cardie

arXiv_CL

arXiv_CL Knowledge Face
Abstract

We present DREAM, the first dialogue-based multiple-choice reading comprehension dataset. Collected from English-as-a-foreign-language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our dataset contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension datasets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM dataset show the effectiveness of dialogue structure and general world knowledge. DREAM will be available at https://dataset.org/dream/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00164

PDF

http://arxiv.org/pdf/1902.00164
Read All
Lift-the-Flap: Context Reasoning Using Object-Centered Graphs

2019-02-01

Mengmi Zhang, Jiashi Feng, Karla Montejo, Joseph Kwon, Joo Hwee Lim, Gabriel Kreiman

arXiv_AI

arXiv_AI Segmentation Reinforcement_Learning Semantic_Segmentation Inference Classification Recognition
Abstract

Children benefit from lift-the-flap books by taking on an active role in guessing what is behind the flap based on the context. In this paper, we introduce lift-the-flap games for computational models. The task is to reason about the scene context and infer what the target behind the flap is in a natural image. Context reasoning is critical in many computer vision applications, such as object recognition and semantic segmentation. To tackle this problem, we propose an object-centered graph representing the scene configuration of the image where each node corresponds to a group of objects belonging to the same category. To infer the target’s class label, we introduce an object-centered graph network model consisting of two sub-networks. The classification sub-network takes the complete graph as input and outputs a classification vector assigning the probability for each class. The reinforcement learning sub-network exploits the class label dependencies and learns the joint probability among objects in order to generate multiple reasonable answers for the missing target. To evaluate our model’s performance, we carry out human behavioral experiments for lift-the-flap games as a benchmark. Our model makes reasonable inferences compared to humans, and significantly outperforms all the null models. We also demonstrate the usefulness of our object-centered graph network model in context-aware object recognition and target priming in visual search.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00163

PDF

http://arxiv.org/pdf/1902.00163
Read All
Compressing GANs using Knowledge Distillation

2019-02-01

Angeline Aguinaldo, Ping-Yeh Chiang, Alex Gain, Ameya Patil, Kolten Pearson, Soheil Feizi

arXiv_CV

arXiv_CV Adversarial Super_Resolution Knowledge GAN Optimization Gradient_Descent
Abstract

Generative Adversarial Networks (GANs) have been used in several machine learning tasks such as domain transfer, super resolution, and synthetic data generation. State-of-the-art GANs often use tens of millions of parameters, making them expensive to deploy for applications in low SWAP (size, weight, and power) hardware, such as mobile devices, and for applications with real time capabilities. There has been no work found to reduce the number of parameters used in GANs. Therefore, we propose a method to compress GANs using knowledge distillation techniques, in which a smaller “student” GAN learns to mimic a larger “teacher” GAN. We show that the distillation methods used on MNIST, CIFAR-10, and Celeb-A datasets can compress teacher GANs at ratios of 1669:1, 58:1, and 87:1, respectively, while retaining the quality of the generated image. From our experiments, we observe a qualitative limit for GAN’s compression. Moreover, we observe that, with a fixed parameter budget, compressed GANs outperform GANs trained using standard training methods. We conjecture that this is partially owing to the optimization landscape of over-parameterized GANs which allows efficient training using alternating gradient descent. Thus, training an over-parameterized GAN followed by our proposed compression scheme provides a high quality generative model with a small number of parameters.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.00159

PDF

https://arxiv.org/pdf/1902.00159
Read All
Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models

2019-02-01

Dinghan Shen, Asli Celikyilmaz, Yizhe Zhang, Liqun Chen, Xin Wang, Jianfeng Gao, Lawrence Carin

arXiv_CL

arXiv_CL Attention Text_Generation
Abstract

Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation with latent variables. In this paper, we investigate several multi-level structures to learn a VAE model to generate long, and coherent text. In particular, we use a hierarchy of stochastic layers between the encoder and decoder networks to generate more informative latent codes. We also investigate a multi-level decoder structure to learn a coherent long-term structure by generating intermediate sentence representations as high-level plan vectors. Empirical results demonstrate that a multi-level VAE model produces more coherent and less repetitive long text compared to the standard VAE models and can further mitigate the posterior-collapse issue.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00154

PDF

http://arxiv.org/pdf/1902.00154
Read All
Deep Triplet Quantization

2019-02-01

Bin Liu, Yue Cao, Mingsheng Long, Jianmin Wang, Jingdong Wang

arXiv_CV

arXiv_CV Image_Retrieval Deep_Learning
Abstract

Deep hashing establishes efficient and effective image retrieval by end-to-end learning of deep representations and hash codes from similarity data. We present a compact coding solution, focusing on deep learning to quantization approach that has shown superior performance over hashing solutions for similarity retrieval. We propose Deep Triplet Quantization (DTQ), a novel approach to learning deep quantization models from the similarity triplets. To enable more effective triplet training, we design a new triplet selection approach, Group Hard, that randomly selects hard triplets in each image group. To generate compact binary codes, we further apply a triplet quantization with weak orthogonality during triplet training. The quantization loss reduces the codebook redundancy and enhances the quantizability of deep representations through back-propagation. Extensive experiments demonstrate that DTQ can generate high-quality and compact binary codes, which yields state-of-the-art image retrieval performance on three benchmark datasets, NUS-WIDE, CIFAR-10, and MS-COCO.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00153

PDF

http://arxiv.org/pdf/1902.00153
Read All
Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning

2019-01-31

Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Songhwai Oh

arXiv_AI

arXiv_AI Regularization Reinforcement_Learning
Abstract

In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs, including the optimality condition, performance error bounds, and convergence. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems, we propose a model-free actor-critic RL method using Tsallis entropy maximization. We evaluate the regularization effect of the Tsallis entropy with various values of entropic indices and show that the entropic index controls the exploration tendency of the proposed method. For a different type of RL problems, we find that a different value of the entropic index is desirable. The proposed method is evaluated using the MuJoCo simulator and achieves the state-of-the-art performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00137

PDF

http://arxiv.org/pdf/1902.00137
Read All
US-net for robust and efficient nuclei instance segmentation

2019-01-31

Zhaoyang Xu, Faranak Sobhani, Carlos Fernandez Moro, Qianni Zhang

arXiv_CV

arXiv_CV Segmentation Detection
Abstract

We present a novel neural network architecture, US-Net, for robust nuclei instance segmentation in histopathology images. The proposed framework integrates the nuclei detection and segmentation networks by sharing their outputs through the same foundation network, and thus enhancing the performance of both. The detection network takes into account the high-level semantic cues with contextual information, while the segmentation network focuses more on the low-level details like the edges. Extensive experiments reveal that our proposed framework can strengthen the performance of both branch networks in an integrated architecture and outperforms most of the state-of-the-art nuclei detection and segmentation networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00125

PDF

http://arxiv.org/pdf/1902.00125
Read All
Learning to Make Analogies by Contrasting Abstract Relational Structure

2019-01-31

Felix Hill, Adam Santoro, David G.T. Barrett, Ari S. Morcos, Timothy Lillicrap

arXiv_AI

arXiv_AI Attention Relation
Abstract

Analogical reasoning has been a principal focus of various waves of AI research. Analogy is particularly challenging for machines because it requires relational structures to be represented such that they can be flexibly applied across diverse domains of experience. Here, we study how analogical reasoning can be induced in neural networks that learn to perceive and reason about raw visual data. We find that the critical factor for inducing such a capacity is not an elaborate architecture, but rather, careful attention to the choice of data and the manner in which it is presented to the model. The most robust capacity for analogical reasoning is induced when networks learn analogies by contrasting abstract relational structures in their input domains, a training method that uses only the input data to force models to learn about important abstract features. Using this technique we demonstrate capacities for complex, visual and symbolic analogy making and generalisation in even the simplest neural network architectures.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00120

PDF

http://arxiv.org/pdf/1902.00120
Read All
Episodic Training for Domain Generalization

2019-01-31

Da Li, Jianshu Zhang, Yongxin Yang, Cong Liu, Yi-Zhe Song, Timothy M. Hospedales

arXiv_CV

arXiv_CV Recognition
Abstract

Domain generalization (DG) is the challenging and topical problem of learning models that generalize to novel testing domain with different statistics than a set of known training domains. The simple approach of aggregating data from all source domains and training a single deep neural network end-to-end on all the data provides a surprisingly strong baseline that surpasses many prior published methods. In this paper we build on this strong baseline by designing an episodic training procedure that trains a single deep network in a way that exposes it to the domain shift that characterises a novel domain at runtime. Specifically, we decompose a deep network into feature extractor and classifier components, and then train each component by simulating it interacting with a partner who is badly tuned for the current domain. This makes both components more robust, ultimately leading to our networks producing state-of-the-art performance on three DG benchmarks. As a demonstration, we consider the pervasive workflow of using an ImageNet trained CNN as a fixed feature extractor for downstream recognition tasks. Using the Visual Decathlon benchmark, we demonstrate that our episodic-DG training improves the performance of such a general purpose feature extractor by explicitly training it for robustness to novel problems. This provides the largest-scale demonstration of heterogeneous DG to date.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00113

PDF

http://arxiv.org/pdf/1902.00113
Read All
Efficient order picking methods in robotic mobile fulfillment systems

2019-01-31

Lin Xie, Nils Thieme, Ruslan Krenzler, Hanyi Li

arXiv_AI

arXiv_AI Knowledge Attention
Abstract

Robotic mobile fulfillment systems (RMFSs) are a new type of warehousing system, which has received more attention recently, due to increasing growth in the e-commerce sector. Instead of sending pickers to the inventory area to search for and pick the ordered items, robots carry shelves (called “pods”) including ordered items from the inventory area to picking stations. In the picking stations, human pickers put ordered items into totes; then these items are transported by a conveyor to the packing stations. This type of warehousing system relieves the human pickers and improves the picking process. In this paper, we concentrate on decisions about the assignment of pods to stations and orders to stations to fulfill picking for each incoming customer’s order. In previous research for an RMFS with multiple picking stations, these decisions are made sequentially. Instead, we present a new integrated model. To improve the system performance even more, we extend our model by splitting orders. This means parts of an order are allowed to be picked at different stations. To the best of the authors’ knowledge, this is the first publication on split orders in an RMFS. We analyze different performance metrics, such as pile-on, pod-station visits, robot moving distance and order turn-over time. We compare the results of our models in different instances with the sequential method in our open-source simulation framework RAWSim-O.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03092

PDF

http://arxiv.org/pdf/1902.03092
Read All
Learning Metric Graphs for Neuron Segmentation In Electron Microscopy Images

2019-01-31

Kyle Luther, H. Sebastian Seung

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

In the deep metric learning approach to image segmentation, a convolutional net densely generates feature vectors at the pixels of an image. Pairs of feature vectors are trained to be similar or different, depending on whether the corresponding pixels belong to same or different ground truth segments. To segment a new image, the feature vectors are computed and clustered. Both empirically and theoretically, it is unclear whether or when deep metric learning is superior to the more conventional approach of directly predicting an affinity graph with a convolutional net. We compare the two approaches using brain images from serial section electron microscopy images, which constitute an especially challenging example of instance segmentation. We first show that seed-based postprocessing of the feature vectors, as originally proposed, produces inferior accuracy because it is difficult for the convolutional net to predict feature vectors that remain uniform across large objects. Then we consider postprocessing by thresholding a nearest neighbor graph followed by connected components. In this case, segmentations from a “metric graph” turn out to be competitive or even superior to segmentations from a directly predicted affinity graph. To explain these findings theoretically, we invoke the property that the metric function satisfies the triangle inequality. Then we show with an example where this constraint suppresses noise, causing connected components to more robustly segment a metric graph than an unconstrained affinity graph.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00100

PDF

http://arxiv.org/pdf/1902.00100
Read All
The Second Conversational Intelligence Challenge

2019-01-31

Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W Black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston

arXiv_AI

arXiv_AI QA
Abstract

We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) – in terms of repetition, consistency and balance of dialogue acts (e.g. how many questions asked vs. answered).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00098

PDF

http://arxiv.org/pdf/1902.00098
Read All
Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling

2019-01-31

Greg Olmschenk, Hao Tang, Zhigang Zhu

arXiv_CV

arXiv_CV CNN
Abstract

Gatherings of thousands to millions of people occur frequently for an enormous variety of events, and automated counting of these high density crowds is used for safety, management, and measuring significance of these events. In this work, we show that the regularly accepted labeling scheme of crowd density maps for training deep neural networks is less effective than our alternative inverse k-nearest neighbor (i$k$NN) maps, even when used directly in existing state-of-the-art network structures. We also provide a new network architecture MUD-i$k$NN, which uses multi-scale upsampling via transposed convolutions to take full advantage of the provided i$k$NN labeling. This upsampling combined with the i$k$NN maps further outperforms the existing state-of-the-art methods. The full label comparison emphasizes the importance of the labeling scheme, with the i$k$NN labeling being particularly effective. We demonstrate the accuracy of our MUD-i$k$NN network and the i$k$NN labeling scheme on a variety of datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05379

PDF

http://arxiv.org/pdf/1902.05379
Read All
Characterizing Input Methods for Human-to-robot Demonstrations

2019-01-31

Pragathi Praveena, Guru Subramani, Bilge Mutlu, Michael Gleicher

arXiv_RO

arXiv_RO
Abstract

Human demonstrations are important in a range of robotics applications, and are created with a variety of input methods. However, the design space for these input methods has not been extensively studied. In this paper, focusing on demonstrations of hand-scale object manipulation tasks to robot arms with two-finger grippers, we identify distinct usage paradigms in robotics that utilize human-to-robot demonstrations, extract abstract features that form a design space for input methods, and characterize existing input methods as well as a novel input method that we introduce, the instrumented tongs. We detail the design specifications for our method and present a user study that compares it against three common input methods: free-hand manipulation, kinesthetic guidance, and teleoperation. Study results show that instrumented tongs provide high quality demonstrations and a positive experience for the demonstrator while offering good correspondence to the target robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00084

PDF

http://arxiv.org/pdf/1902.00084
Read All
Rhythm Zone Theory: Speech Rhythms are Physical after all

2019-01-31

Dafydd Gibbon, Xuewei Lin

arXiv_CL

arXiv_CL Detection
Abstract

Speech rhythms have been dealt with in three main ways: from the introspective analyses of rhythm as a correlate of syllable and foot timing in linguistics and applied linguistics, through analyses of durations of segments of utterances associated with consonantal and vocalic properties, syllables, feet and words, to models of rhythms in speech production and perception as physical oscillations. The present study avoids introspection and human-filtered annotation methods and extends the signal processing paradigm of amplitude envelope spectrum analysis by adding an additional analytic step of edge detection, and postulating the co-existence of multiple speech rhythms in rhythm zones marked by identifiable edges (Rhythm Zone Theory, RZT). An exploratory investigation of the utility of RZT is conducted, suggesting that native and non-native readings of the same text are distinct sub-genres of read speech: a reading by a US native speaker and non-native readings by relatively low-performing Cantonese adult learners of English. The study concludes by noting that with the methods used, RZT can distinguish between the speech rhythms of well-defined sub-genres of native speaker reading vs. non-native learner reading, but needs further refinement in order to be applied to the paradoxically more complex speech of low-performing language learners, whose speech rhythms are co-determined by non-fluency and disfluency factors in addition to well-known linguistic factors of grammar, vocabulary and discourse constraints.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01267

PDF

http://arxiv.org/pdf/1902.01267
Read All
Comparison and Experimental Validation of Predictive Models for Soft, Fiber-Reinforced Actuators

2019-01-31

Audrey Sedal, Alan Wineman, R Brent Gillespie, C David Remy

arXiv_RO

arXiv_RO Relation
Abstract

Successful soft robot modeling approaches appearing in recent literature have been based on a variety of distinct theories, including traditional robotic theory, continuum mechanics, and machine learning. Though specific modeling techniques have been developed for and validated against already realized systems, their strengths and weaknesses have not been explicitly compared against each other. In this paper, we show how three distinct model structures —a lumped-parameter model, a continuum mechanical model, and a neural network— compare in capturing the gross trends and specific features of the force generation of soft robotic actuators. In particular, we study models for Fiber Reinforced Elastomeric Enclosures (FREEs), which are a popular choice of soft actuator and that are used in several soft articulated systems, including soft manipulators, exoskeletons, grippers, and locomoting soft robots. We generated benchmark data by testing eight FREE samples that spanned broad design and kinematic spaces and compared the models on their ability to predict the loading-deformation relationships of these samples. This comparison shows the predictive capabilities of each model on individual actuators and each model’s generalizability across the design space. While the neural net achieved the highest peak performance, the first principles-based models generalized best across all actuator design parameters tested. The results highlight the essential roles of mathematical structure and experimental parameter determination in building high-performing, generalizable soft actuator models with varying effort invested in system identification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00054

PDF

http://arxiv.org/pdf/1902.00054
Read All
Gaussian Conditional Random Fields for Classification

2019-01-31

Andrija Petrović, Mladen Nikolić, Miloš Jovanović, Boris Delibašić

arXiv_AI

arXiv_AI Inference Classification Prediction
Abstract

Gaussian conditional random fields (GCRF) are a well-known used structured model for continuous outputs that uses multiple unstructured predictors to form its features and at the same time exploits dependence structure among outputs, which is provided by a similarity measure. In this paper, a Gaussian conditional random fields model for structured binary classification (GCRFBC) is proposed. The model is applicable to classification problems with undirected graphs, intractable for standard classification CRFs. The model representation of GCRFBC is extended by latent variables which yield some appealing properties. Thanks to the GCRF latent structure, the model becomes tractable, efficient and open to improvements previously applied to GCRF regression models. In addition, the model allows for reduction of noise, that might appear if structures were defined directly between discrete outputs. Additionally, two different forms of the algorithm are presented: GCRFBCb (GCRGBC - Bayesian) and GCRFBCnb (GCRFBC - non Bayesian). The extended method of local variational approximation of sigmoid function is used for solving empirical Bayes in Bayesian GCRFBCb variant, whereas MAP value of latent variables is the basis for learning and inference in the GCRFBCnb variant. The inference in GCRFBCb is solved by Newton-Cotes formulas for one-dimensional integration. Both models are evaluated on synthetic data and real-world data. It was shown that both models achieve better prediction performance than unstructured predictors. Furthermore, computational and memory complexity is evaluated. Advantages and disadvantages of the proposed GCRFBCb and GCRFBCnb are discussed in detail.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00045

PDF

http://arxiv.org/pdf/1902.00045
Read All
Your Gameplay Says it All: Modelling Motivation in Tom Clancy's The Division

2019-01-31

David Melhart, Ahmad Azadvar, Alessandro Canossa, Antonios Liapis, Georgios N. Yannakakis

arXiv_AI

arXiv_AI Survey
Abstract

Is it possible to predict the motivation of players just by observing their gameplay data? Even if so, how should we measure motivation in the first place? To address the above questions, on the one end, we collect a large dataset of gameplay data from players of the popular game Tom Clancy’s The Division (Ubisoft, 2016). On the other end we ask them to report their levels of competence, autonomy, relatedness and presence using the in-house designed Ubisoft Perceived Experience Questionnaire. After processing the survey responses in an ordinal fashion we employ preference learning methods, based on support vector machines, to infer the mapping between gameplay and the four motivation factors. Our key findings suggest that gameplay features are strong predictors of player motivation as the obtained models reach accuracies of near certainty, in particular, from 93% up to 97% on unseen players.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00040

PDF

http://arxiv.org/pdf/1902.00040
Read All
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

2019-01-31

Hedi Ben-younes, Rémi Cadene, Nicolas Thome, Matthieu Cord

arXiv_CV

arXiv_CV QA Represenation_Learning Deep_Learning Detection Relation VQA
Abstract

Multimodal representation learning is gaining more and more interest within the deep learning community. While bilinear models provide an interesting framework to find subtle combination of modalities, their number of parameters grows quadratically with the input dimensions, making their practical implementation within classical deep learning pipelines challenging. In this paper, we introduce BLOCK, a new multimodal fusion based on the block-superdiagonal tensor decomposition. It leverages the notion of block-term ranks, which generalizes both concepts of rank and mode ranks for tensors, already used for multimodal fusion. It allows to define new ways for optimizing the tradeoff between the expressiveness and complexity of the fusion model, and is able to represent very fine interactions between modalities while maintaining powerful mono-modal representations. We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities. Through extensive experiments, we show that BLOCK compares favorably with respect to state-of-the-art multimodal fusion models for both VQA and VRD tasks. Our code is available at \url{https://github.com/Cadene/block.bootstrap.pytorch}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00038

PDF

http://arxiv.org/pdf/1902.00038
Read All
Towards Machine-assisted Meta-Studies: The Hubble Constant

2019-01-31

Tom Crossland, Pontus Stenetorp, Sebastian Riedel, Daisuke Kawata, Thomas D. Kitching, Rupert A. C. Croft

arXiv_CL

arXiv_CL
Abstract

We present an approach for automatic extraction of measured values from the astrophysical literature, using the Hubble constant for our pilot study. Our rules-based model – a classical technique in natural language processing – has successfully extracted 298 measurements of the Hubble constant, with uncertainties, from the 208,541 available arXiv astrophysics papers. We have also created an artificial neural network classifier to identify papers which report novel measurements. This classifier is applied to the available arXiv data, and is demonstrated to work well in identifying papers which are reporting new measurements. From the analysis of our results we find that reporting measurements with uncertainties and the correct units is critical information to identify novel measurements in free text. Our results correctly highlight the current tension for measurements of the Hubble constant and recover the $3.5\sigma$ discrepancy – demonstrating that the tool presented in this paper is useful for meta-studies of astrophysical measurements from a large number of publications, and showing the potential to generalise this technique to other areas.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00027

PDF

http://arxiv.org/pdf/1902.00027
Read All
A Geometric Perspective on Optimal Representations for Reinforcement Learning

2019-01-31

Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

arXiv_AI

arXiv_AI Adversarial Reinforcement_Learning Represenation_Learning Optimization Prediction
Abstract

This paper proposes a new approach to representation learning based on geometric properties of the space of value functions. We study a two-part approximation of the value function: a nonlinear map from states to vectors, or representation, followed by a linear map from vectors to values. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We argue that these AVFs make excellent auxiliary tasks, and use them to construct a loss which can be efficiently minimized to find a near-optimal representation for reinforcement learning. We highlight characteristics of the method in a series of experiments on the four-room domain.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.11530

PDF

http://arxiv.org/pdf/1901.11530
Read All
Visual Hindsight Experience Replay

2019-01-31

Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin

arXiv_AI

arXiv_AI Sparse Reinforcement_Learning
Abstract

Reinforcement Learning algorithms typically require millions of environment interactions to learn successful policies in sparse reward settings. Hindsight Experience Replay (HER) was introduced as a technique to increase sample efficiency through re-imagining unsuccessful trajectories as successful ones by replacing the originally intended goals. However, this method is not applicable to visual domains where the goal configuration is unknown and must be inferred from observation. In this work, we show how unsuccessful visual trajectories can be hallucinated to be successful using a generative model trained on relatively few snapshots of the goal. As far as we are aware, this is the first work that does so with the agent policy conditioned solely on its state. We then apply this model to training reinforcement learning agents in discrete and continuous settings. We show results on a navigation and pick-and-place task in a 3D environment and on a simulated robotics application. Our method shows marked improvement over standard RL algorithms and baselines derived from prior work.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.11529

PDF

http://arxiv.org/pdf/1901.11529
Read All

169/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL