Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Modified Distribution Alignment for Domain Adaptation with Pre-trainedInception ResNet

2019-04-04

Youshan Zhang, Brian D. Davison

arXiv_CV

arXiv_CV Classification Recognition
Abstract

Deep neural networks have been widely used in computer vision. There are several well trained deep neural networks for the ImageNet classification challenge, which has played a significant role in image recognition. However, little work has explored pre-trained neural networks for image recognition in domain adaption. In this paper, we are the first to extract better-represented features from a pre-trained Inception ResNet model for domain adaptation. We then present a modified distribution alignment method for classification using the extracted features. We test our model using three benchmark datasets (Office+Caltech-10, Office-31, and Office-Home). Extensive experiments demonstrate significant improvements (4.8%, 5.5%, and 10%) in classification accuracy over the state-of-the-art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02322

PDF

http://arxiv.org/pdf/1904.02322
Read All
Guiding Extractive Summarization with Question-Answering Rewards

2019-04-04

Kristjan Arumae, Fei Liu

arXiv_CL

arXiv_CL Salient Summarization
Abstract

Highlighting while reading is a natural behavior for people to track salient content of a document. It would be desirable to teach an extractive summarizer to do the same. However, a major obstacle to the development of a supervised summarizer is the lack of ground-truth. Manual annotation of extraction units is cost-prohibitive, whereas acquiring labels by automatically aligning human abstracts and source documents can yield inferior results. In this paper we describe a novel framework to guide a supervised, extractive summarization system with question-answering rewards. We argue that quality summaries should serve as a document surrogate to answer important questions, and such question-answer pairs can be conveniently obtained from human abstracts. The system learns to promote summaries that are informative, fluent, and perform competitively on question-answering. Our results compare favorably with those reported by strong summarization baselines as evaluated by automatic metrics and human assessors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02321

PDF

http://arxiv.org/pdf/1904.02321
Read All
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction

2019-04-04

Longlong Jing, Xiaodong Yang, Jingen Liu, Yingli Tian

arXiv_CV

arXiv_CV Video_Caption Action_Recognition Prediction Recognition
Abstract

The success of deep neural networks generally requires a vast amount of training data to be labeled, which is expensive and unfeasible in scale, especially for video collections. To alleviate this problem, in this paper, we propose 3DRotNet: a fully self-supervised approach to learn spatiotemporal features from unlabeled videos. A set of rotations are applied to all videos, and a pretext task is defined as prediction of these rotations. When accomplishing this task, 3DRotNet is actually trained to understand the semantic concepts and motions in videos. In other words, it learns a spatiotemporal video representation, which can be transferred to improve video understanding tasks in small datasets. Our extensive experiments successfully demonstrate the effectiveness of the proposed framework on action recognition, leading to significant improvements over the state-of-the-art self-supervised methods. With the self-supervised pre-trained 3DRotNet from large datasets, the recognition accuracy is boosted up by 20.4% on UCF101 and 16.7% on HMDB51 respectively, compared to the models trained from scratch.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.11387

PDF

http://arxiv.org/pdf/1811.11387
Read All
Towards a Robust Aerial Cinematography Platform: Localizing and Tracking Moving Targets in Unstructured Environments

2019-04-04

Rogerio Bonatti, Cherie Ho, Wenshan Wang, Sanjiban Choudhury, Sebastian Scherer

arXiv_CV

arXiv_CV Pose_Estimation Tracking Drone
Abstract

The use of drones for aerial cinematography has revolutionized several applications and industries requiring live and dynamic camera viewpoints such as entertainment, sports, and security. However, safely controlling a drone while filming a moving target usually requires multiple expert human operators; hence the need for an autonomous cinematographer. Current approaches have severe real-life limitations such as requiring scripted scenes that can be solved offline, high-precision motion-capture systems or GPS tags to localize targets, and prior maps of the environment to avoid obstacles and plan for occlusion. In this work, we overcome such limitations and propose a complete system for aerial cinematography that combines: (1) a visual pose estimation algorithm for target localization; (2) a real-time incremental 3D signed-distance map algorithm for occlusion and safety computation; and (3) a real-time camera motion planner that optimizes smoothness, collisions, occlusions and artistic guidelines. We evaluate robustness and real-time performance in series of field experiments and simulations by tracking dynamic targets moving through unknown, unstructured environments. Finally, we verify that despite removing previous limitations, our system still matches state-of-the-art performance. Videos of the system in action can be seen at https://youtu.be/ZE9MnCVmumc

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02319

PDF

http://arxiv.org/pdf/1904.02319
Read All
Comparison Network for One-Shot Conditional Object Detection

2019-04-04

Tengfei Zhang, Yue Zhang, Xian Sun, Hao Sun, Menglong Yan, Xue Yang, Kun Fu

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

The current advances in object detection depend on large-scale datasets to get good performance. However, there may not always be sufficient samples in many scenarios, which leads to the research on few-shot detection as well as its extreme variation one-shot detection. In this paper, the one-shot detection has been formulated as a conditional probability problem. With this insight, a novel one-shot conditional object detection (OSCD) framework, referred as Comparison Network (ComparisonNet), has been proposed. Specifically, query and target image features are extracted through a Siamese network as mapped metrics of marginal probabilities. A two-stage detector for OSCD is introduced to compare the extracted query and target features with the learnable metric to approach the optimized non-linear conditional probability. Once trained, ComparisonNet can detect objects of both seen and unseen classes without further training, which also has the advantages including class-agnostic, training-free for unseen classes, and without catastrophic forgetting. Experiments show that the proposed approach achieves state-of-the-art performance on the proposed datasets of Fashion-MNIST and PASCAL VOC.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02317

PDF

http://arxiv.org/pdf/1904.02317
Read All
Simple Question Answering with Subgraph Ranking and Joint-Scoring

2019-04-04

Wenbo Zhao, Tagyoung Chung, Anuj Goyal, Angeliki Metallinou

arXiv_CL

arXiv_CL Knowledge_Graph Knowledge QA Relation
Abstract

Knowledge graph based simple question answering (KBSQA) is a major area of research within question answering. Although only dealing with simple questions, i.e., questions that can be answered through a single knowledge base (KB) fact, this task is neither simple nor close to being solved. Targeting on the two main steps, subgraph selection and fact selection, the research community has developed sophisticated approaches. However, the importance of subgraph ranking and leveraging the subject–relation dependency of a KB fact have not been sufficiently explored. Motivated by this, we present a unified framework to describe and analyze existing approaches. Using this framework as a starting point, we focus on two aspects: improving subgraph selection through a novel ranking method and leveraging the subject–relation dependency by proposing a joint scoring CNN model with a novel loss function that enforces the well-order of scores. Our methods achieve a new state of the art (85.44% in accuracy) on the SimpleQuestions dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04049

PDF

http://arxiv.org/pdf/1904.04049
Read All
Improved Inference via Deep Input Transfer

2019-04-04

Saied Asgari Taghanaki, Kumar Abhishek, Ghassan Hamarneh

arXiv_CV

arXiv_CV Segmentation CNN Inference
Abstract

Although numerous improvements have been made in the field of image segmentation using convolutional neural networks, the majority of these improvements rely on training with larger datasets, model architecture modifications, novel loss functions, and better optimizers. In this paper, we propose a new segmentation performance boosting paradigm that relies on optimally modifying the network’s input instead of the network itself. In particular, we leverage the gradients of a trained segmentation network with respect to the input to transfer it to a space where the segmentation accuracy improves. We test the proposed method on three publicly available medical image segmentation datasets: the ISIC 2017 Skin Lesion Segmentation dataset, the Shenzhen Chest X-Ray dataset, and the CVC-ColonDB dataset, for which our method achieves improvements of 5.8%, 0.5%, and 4.8% in the average Dice scores, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02307

PDF

http://arxiv.org/pdf/1904.02307
Read All
A Simple Joint Model for Improved Contextual Neural Lemmatization

2019-04-04

Chaitanya Malaviya, Shijie Wu, Ryan Cotterell

arXiv_CL

arXiv_CL
Abstract

English verbs have multiple forms. For instance, talk may also appear as talks, talked or talking, depending on the context. The NLP task of lemmatization seeks to map these diverse forms back to a canonical one, known as the lemma. We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora. Our paper describes the model in addition to training and decoding procedures. Error analysis indicates that joint morphological tagging and lemmatization is especially helpful in low-resource lemmatization and languages that display a larger degree of morphological complexity. Code and pre-trained models are available at https://sigmorphon.github.io/sharedtasks/2019/task2/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02306

PDF

http://arxiv.org/pdf/1904.02306
Read All
Robust Deep Gaussian Processes

2019-04-04

Jeremias Knoblauch

arXiv_AI

arXiv_AI Inference
Abstract

This report provides an in-depth overview over the implications and novelty Generalized Variational Inference (GVI) (Knoblauch et al., 2019) brings to Deep Gaussian Processes (DGPs) (Damianou & Lawrence, 2013). Specifically, robustness to model misspecification as well as principled alternatives for uncertainty quantification are motivated with an information-geometric view. These modifications have clear interpretations and can be implemented in less than 100 lines of Python code. Most importantly, the corresponding empirical results show that DGPs can greatly benefit from the presented enhancements.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02303

PDF

http://arxiv.org/pdf/1904.02303
Read All
A Training-free, One-shot Detection Framework For Geospatial Objects In Remote Sensing Images

2019-04-04

Tengfei Zhang, Yue Zhang, Xian Sun, Menglong Yan, Yaoling Wang, Kun Fu

arXiv_CV

arXiv_CV Object_Detection Knowledge Deep_Learning Detection
Abstract

Deep learning based object detection has achieved great success. However, these supervised learning methods are data-hungry and time-consuming. This restriction makes them unsuitable for limited data and urgent tasks, especially in the applications of remote sensing. Inspired by the ability of humans to quickly learn new visual concepts from very few examples, we propose a training-free, one-shot geospatial object detection framework for remote sensing images. It consists of (1) a feature extractor with remote sensing domain knowledge, (2) a multi-level feature fusion method, (3) a novel similarity metric method, and (4) a 2-stage object detection pipeline. Experiments on sewage treatment plant and airport detections show that proposed method has achieved a certain effect. Our method can serve as a baseline for training-free, one-shot geospatial object detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02302

PDF

http://arxiv.org/pdf/1904.02302
Read All
Cost-Sensitive Feature Selection by Optimizing F-Measures

2019-04-04

Meng Liu, Chang Xu, Yong Luo, Chao Xu, Yonggang Wen, Dacheng Tao

arXiv_CV

arXiv_CV Optimization Classification
Abstract

Feature selection is beneficial for improving the performance of general machine learning tasks by extracting an informative subset from the high-dimensional features. Conventional feature selection methods usually ignore the class imbalance problem, thus the selected features will be biased towards the majority class. Considering that F-measure is a more reasonable performance measure than accuracy for imbalanced data, this paper presents an effective feature selection algorithm that explores the class imbalance issue by optimizing F-measures. Since F-measure optimization can be decomposed into a series of cost-sensitive classification problems, we investigate the cost-sensitive feature selection by generating and assigning different costs to each class with rigorous theory guidance. After solving a series of cost-sensitive feature selection problems, features corresponding to the best F-measure will be selected. In this way, the selected features will fully represent the properties of all classes. Experimental results on popular benchmarks and challenging real-world data sets demonstrate the significance of cost-sensitive feature selection for the imbalanced data setting and validate the effectiveness of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02301

PDF

http://arxiv.org/pdf/1904.02301
Read All
Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer

2019-04-04

Xinyuan Chen, Chang Xu, Xiaokang Yang, Li Song, Dacheng Tao

arXiv_CV

arXiv_CV Adversarial GAN Style_Transfer
Abstract

Style transfer describes the rendering of an image semantic content as different artistic styles. Recently, generative adversarial networks (GANs) have emerged as an effective approach in style transfer by adversarially training the generator to synthesize convincing counterfeits. However, traditional GAN suffers from the mode collapse issue, resulting in unstable training and making style transfer quality difficult to guarantee. In addition, the GAN generator is only compatible with one style, so a series of GANs must be trained to provide users with choices to transfer more than one kind of style. In this paper, we focus on tackling these challenges and limitations to improve style transfer. We propose adversarial gated networks (Gated GAN) to transfer multiple styles in a single model. The generative networks have three modules: an encoder, a gated transformer, and a decoder. Different styles can be achieved by passing input images through different branches of the gated transformer. To stabilize training, the encoder and decoder are combined as an autoencoder to reconstruct the input images. The discriminative networks are used to distinguish whether the input image is a stylized or genuine image. An auxiliary classifier is used to recognize the style categories of transferred images, thereby helping the generative networks generate images in multiple styles. In addition, Gated GAN makes it possible to explore a new style by investigating styles learned from artists or genres. Our extensive experiments demonstrate the stability and effectiveness of the proposed model for multistyle transfer.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02296

PDF

http://arxiv.org/pdf/1904.02296
Read All
Evaluating Style Transfer for Text

2019-04-04

Remi Mir, Bjarke Felbo, Nick Obradovich, Iyad Rahwan

arXiv_CL

arXiv_CL Sentiment Adversarial Style_Transfer Classification
Abstract

Research in the area of style transfer for text is currently bottlenecked by a lack of standard evaluation practices. This paper aims to alleviate this issue by experimentally identifying best practices with a Yelp sentiment dataset. We specify three aspects of interest (style transfer intensity, content preservation, and naturalness) and show how to obtain more reliable measures of them from human evaluation than in previous work. We propose a set of metrics for automated evaluation and demonstrate that they are more strongly correlated and in agreement with human judgment: direction-corrected Earth Mover’s Distance, Word Mover’s Distance on style-masked texts, and adversarial classification for the respective aspects. We also show that the three examined models exhibit tradeoffs between aspects of interest, demonstrating the importance of evaluating style transfer models at specific points of their tradeoff plots. We release software with our evaluation metrics to facilitate research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02295

PDF

http://arxiv.org/pdf/1904.02295
Read All
Generative Adversarial Networks for text using word2vec intermediaries

2019-04-04

Akshay Budhkar, Krishnapriya Vishnubhotla, Safwan Hossain, Frank Rudzicz

arXiv_AI

arXiv_AI Adversarial GAN Embedding Language_Model
Abstract

Generative adversarial networks (GANs) have shown considerable success, especially in the realistic generation of images. In this work, we apply similar techniques for the generation of text. We propose a novel approach to handle the discrete nature of text, during training, using word embeddings. Our method is agnostic to vocabulary size and achieves competitive results relative to methods with various discrete gradient estimators.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02293

PDF

http://arxiv.org/pdf/1904.02293
Read All
A General Framework for Adversarial Examples with Objectives

2019-04-04

Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, Michael K. Reiter

arXiv_CV

arXiv_CV Adversarial Face Recognition Face_Recognition
Abstract

Images perturbed subtly to be misclassified by neural networks, called adversarial examples, have emerged as a technically deep challenge and an important concern for several application domains. Most research on adversarial examples takes as its only constraint that the perturbed images are similar to the originals. However, real-world application of these ideas often requires the examples to satisfy additional objectives, which are typically enforced through custom modifications of the perturbation process. In this paper, we propose adversarial generative nets (AGNs), a general methodology to train a generator neural network to emit adversarial examples satisfying desired objectives. We demonstrate the ability of AGNs to accommodate a wide range of objectives, including imprecise ones difficult to model, in two application domains. In particular, we demonstrate physical adversarial examples—eyeglass frames designed to fool face recognition—with better robustness, inconspicuousness, and scalability than previous approaches, as well as a new attack to fool a handwritten-digit classifier.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.00349

PDF

http://arxiv.org/pdf/1801.00349
Read All
Answer-based Adversarial Training for Generating Clarification Questions

2019-04-04

Sudha Rao, Hal Daumé III

arXiv_CL

arXiv_CL Adversarial GAN
Abstract

We present an approach for generating clarification questions with the goal of eliciting new information that would make the given textual context more complete. We propose that modeling hypothetical answers (to clarification questions) as latent variables can guide our approach into generating more useful clarification questions. We develop a Generative Adversarial Network (GAN) where the generator is a sequence-to-sequence model and the discriminator is a utility function that models the value of updating the context with the answer to the clarification question. We evaluate on two datasets, using both automatic metrics and human judgments of usefulness, specificity and relevance, showing that our approach outperforms both a retrieval-based model and ablations that exclude the utility model and the adversarial training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02281

PDF

http://arxiv.org/pdf/1904.02281
Read All
Continuous Direct Sparse Visual Odometry from RGB-D Images

2019-04-03

Maani Ghaffari, William Clark, Anthony Bloch, Ryan M. Eustice, Jessy W. Grizzle

arXiv_AI

arXiv_AI Sparse Tracking
Abstract

This paper reports on a novel formulation and evaluation of visual odometry from RGB-D images. Assuming a static scene, the developed theoretical framework generalizes the widely used direct energy formulation (photometric error minimization) technique for obtaining a rigid body transformation that aligns two overlapping RGB-D images to a continuous formulation. The continuity is achieved through functional treatment of the problem and representing the process models over RGB-D images in a reproducing kernel Hilbert space; consequently, the registration is not limited to the specific image resolution and the framework is fully analytical with a closed-form derivation of the gradient. We solve the problem by maximizing the inner product between two functions defined over RGB-D images, while the continuous action of the rigid body motion Lie group is captured through the integration of the flow in the corresponding Lie algebra. Energy-based approaches have been extremely successful and the developed framework in this paper shares many of their desired properties such as the parallel structure on both CPUs and GPUs, sparsity, semi-dense tracking, avoiding explicit data association which is computationally expensive, and possible extensions to the simultaneous localization and mapping frameworks. The evaluations on experimental data and comparison with the energy-based formulation of the problem confirm the effectiveness of the proposed technique, especially, when the lack of structure and texture in the environment is evident.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02266

PDF

http://arxiv.org/pdf/1904.02266
Read All
Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation

2019-04-03

Xing Niu, Weijia Xu, Marine Carpuat

arXiv_CL

arXiv_CL NMT
Abstract

We aim to better exploit the limited amounts of parallel text available in low-resource settings by introducing a differentiable reconstruction loss for neural machine translation (NMT). This loss compares original inputs to reconstructed inputs, obtained by back-translating translation hypotheses into the input language. We leverage differentiable sampling and bi-directional NMT to train models end-to-end, without introducing additional parameters. This approach achieves small but consistent BLEU improvements on four language pairs in both translation directions, and outperforms an alternative differentiable reconstruction strategy based on hidden states.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.01116

PDF

https://arxiv.org/pdf/1811.01116
Read All
Towards Resisting Large Data Variations via Introspective Learning

2019-04-03

Yunhan Zhao, Ye Tian, Wei Shen, Alan Yuille

arXiv_CV

arXiv_CV Embedding CNN Classification
Abstract

Learning deep networks which can resist large variations between training and testing data are essential to build accurate and robust image classifiers. Towards this end, a typical strategy is to apply data augmentation to enlarge the training set. However, standard data augmentation is essentially a brute-force method which is inefficient, as it performs all the pre-defined transformations to every training sample. In this paper, we propose a principled approach to train networks with significantly improved resistance to large variations between training and testing data. This is achieved by embedding a learnable transformation module into the introspective network, which is a convolutional neural network (CNN) classifier empowered with generative capabilities. Our approach alternatively synthesizes pseudo-negative samples with learned transformations and enhances the classifier by retraining it with synthesized samples. Experimental results verify that our approach significantly improves the ability of deep networks to resist large variations between training and testing data and achieves classification accuracy improvements on several benchmark datasets, including MNIST, affNIST, SVHN, CIFAR-10 and miniImageNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.06447

PDF

http://arxiv.org/pdf/1805.06447
Read All
Black is to Criminal as Caucasian is to Police:Detecting and Removing Multiclass Bias in Word Embeddings

2019-04-03

Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, Alan W Black

arXiv_CL

arXiv_CL Embedding
Abstract

Online texts – across genres, registers, domains, and styles – are riddled with human stereotypes, expressed in overt or subtle ways. Word embeddings, trained on these texts, perpetuate and amplify these stereotypes, and propagate biases to machine learning models that use word embeddings as features. In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender. Next, we propose a novel methodology for the evaluation of multiclass debiasing. We demonstrate that our multiclass debiasing is robust and maintains the efficacy in standard NLP tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04047

PDF

http://arxiv.org/pdf/1904.04047
Read All
Soft-bubble: A highly compliant dense geometry tactile sensor for robot manipulation

2019-04-03

Alex Alspach, Kunimatsu Hashimoto, Naveen Kuppuswamy, Russ Tedrake

arXiv_RO

arXiv_RO Pose_Estimation Tracking Classification
Abstract

Incorporating effective tactile sensing and mechanical compliance is key towards enabling robust and safe operation of robots in unknown, uncertain and cluttered environments. Towards realizing this goal, we present a lightweight, easy-to-build, highly compliant dense geometry sensor and end effector that comprises an inflated latex membrane with a depth sensor behind it. We present the motivations and the hardware design for this Soft-bubble and demonstrate its capabilities through example tasks including tactile-object classification, pose estimation and tracking, and nonprehensile object manipulation. We also present initial experiments to show the importance of high-resolution geometry sensing for tactile tasks and discuss applications in robust manipulation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02252

PDF

http://arxiv.org/pdf/1904.02252
Read All
StereoDRNet: Dilated Residual Stereo Net

2019-04-03

Rohan Chabra, Julian Straub, Chris Sweeny, Richard Newcombe, Henry Fuchs

arXiv_CV

arXiv_CV CNN
Abstract

We propose a system that uses a convolution neural network (CNN) to estimate depth from a stereo pair followed by volumetric fusion of the predicted depth maps to produce a 3D reconstruction of a scene. Our proposed depth refinement architecture, predicts view-consistent disparity and occlusion maps that helps the fusion system to produce geometrically consistent reconstructions. We utilize 3D dilated convolutions in our proposed cost filtering network that yields better filtering while almost halving the computational cost in comparison to state of the art cost filtering architectures.For feature extraction we use the Vortex Pooling architecture. The proposed method achieves state of the art results in KITTI 2012, KITTI 2015 and ETH 3D stereo benchmarks. Finally, we demonstrate that our system is able to produce high fidelity 3D scene reconstructions that outperforms the state of the art stereo system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02251

PDF

http://arxiv.org/pdf/1904.02251
Read All
Learning Outside the Box: Discourse-level Features Improve Metaphor Identification

2019-04-03

Jesse Mu, Helen Yannakoudakis, Ekaterina Shutova

arXiv_CL

arXiv_CL Embedding
Abstract

Most current approaches to metaphor identification use restricted linguistic contexts, e.g. by considering only a verb’s arguments or the sentence containing a phrase. Inspired by pragmatic accounts of metaphor, we argue that broader discourse features are crucial for better metaphor identification. We train simple gradient boosting classifiers on representations of an utterance and its surrounding discourse learned with a variety of document embedding methods, obtaining near state-of-the-art results on the 2018 VU Amsterdam metaphor identification task without the complex metaphor-specific features or deep neural architectures employed by other systems. A qualitative analysis further confirms the need for broader context in metaphor processing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02246

PDF

http://arxiv.org/pdf/1904.02246
Read All
Multi-task Learning for Japanese Predicate Argument Structure Analysis

2019-04-03

Hikaru Omori, Mamoru Komachi

arXiv_CL

arXiv_CL Knowledge
Abstract

An event-noun is a noun that has an argument structure similar to a predicate. Recent works, including those considered state-of-the-art, ignore event-nouns or build a single model for solving both Japanese predicate argument structure analysis (PASA) and event-noun argument structure analysis (ENASA). However, because there are interactions between predicates and event-nouns, it is not sufficient to target only predicates. To address this problem, we present a multi-task learning method for PASA and ENASA. Our multi-task models improved the performance of both tasks compared to a single-task model by sharing knowledge from each task. Moreover, in PASA, our models achieved state-of-the-art results in overall F1 scores on the NAIST Text Corpus. In addition, this is the first work to employ neural networks in ENASA.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02244

PDF

http://arxiv.org/pdf/1904.02244
Read All
Unpaired Thermal to Visible Spectrum Transfer using Adversarial Training

2019-04-03

Adam Nyberg, Abdelrahman Eldesokey, David Bergström, David Gustafsson

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Thermal Infrared (TIR) cameras are gaining popularity in many computer vision applications due to their ability to operate under low-light conditions. Images produced by TIR cameras are usually difficult for humans to perceive visually, which limits their usability. Several methods in the literature were proposed to address this problem by transforming TIR images into realistic visible spectrum (VIS) images. However, existing TIR-VIS datasets suffer from imperfect alignment between TIR-VIS image pairs which degrades the performance of supervised methods. We tackle this problem by learning this transformation using an unsupervised Generative Adversarial Network (GAN) which trains on unpaired TIR and VIS images. When trained and evaluated on KAIST-MS dataset, our proposed methods was shown to produce significantly more realistic and sharp VIS images than the existing state-of-the-art supervised methods. In addition, our proposed method was shown to generalize very well when evaluated on a new dataset of new environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02242

PDF

http://arxiv.org/pdf/1904.02242
Read All
Speech Dereverberation Using Fully Convolutional Networks

2019-04-03

Ori Ernst, Shlomo E. Chazan, Sharon Gannot, Jacob Goldberger

arXiv_SD

arXiv_SD Adversarial GAN CNN
Abstract

Speech derverberation using a single microphone is addressed in this paper. Motivated by the recent success of the fully convolutional networks (FCN) in many image processing applications, we investigate their applicability to enhance the speech signal represented by short-time Fourier transform (STFT) images. We present two variations: a “U-Net” which is an encoder-decoder network with skip connections and a generative adversarial network (GAN) with U-Net as generator, which yields a more intuitive cost function for training. To evaluate our method we used the data from the REVERB challenge, and compared our results to other methods under the same conditions. We have found that our method outperforms the competing methods in most cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.08243

PDF

http://arxiv.org/pdf/1803.08243
Read All
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

2019-04-03

Xuhao Chen

arXiv_CV

arXiv_CV Sparse CNN Optimization Inference
Abstract

Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80% of the weights, it actually hurts inference performance (speed) when running models on GPUs. Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we propose Escort, an efficient sparse convolutional neural networks on GPUs. Instead of using the lowering method, we choose to compute the sparse convolutions directly. We then orchestrate the parallelism and locality for the direct sparse convolution kernel, and apply customized optimization techniques to further improve performance. Evaluation on NVIDIA GPUs show that Escort can improve sparse convolution speed by 2.63x and 3.07x, and inference speed by 1.43x and 1.69x, compared to CUBLAS and CUSPARSE respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.10280

PDF

http://arxiv.org/pdf/1802.10280
Read All
Hyperbolic Image Embeddings

2019-04-03

Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, Victor Lempitsky

arXiv_CV

arXiv_CV Image_Retrieval Embedding Image_Classification Classification
Abstract

Computer vision tasks such as image classification, image retrieval and few-shot learning are currently dominated by Euclidean and spherical embeddings, so that the final decisions about class belongings or the degree of similarity are made using linear hyperplanes, Euclidean distances, or spherical geodesic distances (cosine similarity). In this work, we demonstrate that in many practical scenarios hyperbolic embeddings provide a better alternative.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02239

PDF

http://arxiv.org/pdf/1904.02239
Read All
Robust Multi-agent Counterfactual Prediction

2019-04-03

Alexander Peysakhovich, Christian Kroer, Adam Lerer

arXiv_AI

arXiv_AI Reinforcement_Learning Prediction
Abstract

We consider the problem of using logged data to make predictions about what would happen if we changed the `rules of the game’ in a multi-agent system. This task is difficult because in many cases we observe actions individuals take but not their private information or their full reward functions. In addition, agents are strategic, so when the rules change, they will also change their actions. Existing methods (e.g. structural estimation, inverse reinforcement learning) make counterfactual predictions by constructing a model of the game, adding the assumption that agents’ behavior comes from optimizing given some goals, and then inverting observed actions to learn agent’s underlying utility function (a.k.a. type). Once the agent types are known, making counterfactual predictions amounts to solving for the equilibrium of the counterfactual environment. This approach imposes heavy assumptions such as rationality of the agents being observed, correctness of the analyst’s model of the environment/parametric form of the agents’ utility functions, and various other conditions to make point identification possible. We propose a method for analyzing the sensitivity of counterfactual conclusions to violations of these assumptions. We refer to this method as robust multi-agent counterfactual prediction (RMAC). We apply our technique to investigating the robustness of counterfactual claims for classic environments in market design: auctions, school choice, and social choice. Importantly, we show RMAC can be used in regimes where point identification is impossible (e.g. those which have multiple equilibria or non-injective maps from type distributions to outcomes).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02235

PDF

http://arxiv.org/pdf/1904.02235
Read All
BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis

2019-04-03

Hu Xu, Bing Liu, Lei Shu, Philip S. Yu

arXiv_CL

arXiv_CL Sentiment Review Knowledge Sentiment_Classification Classification Language_Model
Abstract

Question-answering plays an important role in e-commerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploited to answer user questions.~We call this problem Review Reading Comprehension (RRC). To the best of our knowledge, no existing work has been done on RRC. In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis. Since ReviewRC has limited training examples for RRC (and also for aspect-based sentiment analysis), we then explore a novel post-training approach on the popular language model BERT to enhance the performance of fine-tuning of BERT for RRC. To show the generality of the approach, the proposed post-training is also applied to some other review-based tasks such as aspect extraction and aspect sentiment classification in aspect-based sentiment analysis. Experimental results demonstrate that the proposed post-training is highly effective. The datasets and code are available at https://www.cs.uic.edu/~hxu/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02232

PDF

http://arxiv.org/pdf/1904.02232
Read All
The Effect of Downstream Classification Tasks for Evaluating Sentence Embeddings

2019-04-03

Peter Potash

arXiv_CL

arXiv_CL Embedding Classification Quantitative
Abstract

One popular method for quantitatively evaluating the performance of sentence embeddings involves their usage on downstream language processing tasks that require sentence representations as input. One simple such task is classification, where the sentence representations are used to train and test models on several classification datasets. We argue that by evaluating sentence representations in such a manner, the goal of the representations becomes learning a low-dimensional factorization of a sentence-task label matrix. We show how characteristics of this matrix can affect the ability for a low-dimensional factorization to perform as sentence representations in a suite of classification tasks. Primarily, sentences that have more labels across all possible classification tasks have a higher reconstruction loss, though this effect can be drastically negated if the amount of such sentences is small.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02228

PDF

http://arxiv.org/pdf/1904.02228
Read All
Revisiting Visual Grounding

2019-04-03

Erik Conser, Kennedy Hahn, Chandler M. Watson, Melanie Mitchell

arXiv_CV

arXiv_CV Image_Retrieval Relation
Abstract

We revisit a particular visual grounding method: the “Image Retrieval Using Scene Graphs” (IRSG) system of Johnson et al. (2015). Our experiments indicate that the system does not effectively use its learned object-relationship models. We also look closely at the IRSG dataset, as well as the widely used Visual Relationship Dataset (VRD) that is adapted from it. We find that these datasets exhibit biases that allow methods that ignore relationships to perform relatively well. We also describe several other problems with the IRSG dataset, and report on experiments using a subset of the dataset in which the biases and other problems are removed. Our studies contribute to a more general effort: that of better understanding what machine learning methods that combine language and vision actually learn and what popular datasets actually test.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02225

PDF

http://arxiv.org/pdf/1904.02225
Read All
Learning Physics-Based Manipulation in Clutter: Combining Image-Based Generalization and Look-Ahead Planning

2019-04-03

Wissam Bejjani, Mehmet R. Dogar, Matteo Leonetti

arXiv_RO

arXiv_RO
Abstract

Physics-based manipulation in clutter involves complex interaction between multiple objects. In this paper, we consider the problem of learning, from interaction in a physics simulator, manipulation skills to solve this multi-step sequential decision making problem in the real world. Our approach has two key properties: (i) the ability to generalize (over the shape and number of objects in the scene) using an abstract image-based representation that enables a neural network to learn useful features; and (ii) the ability to perform look-ahead planning using a physics simulator, which is essential for such multi-step problems. We show, in sets of simulated and real-world experiments (video available on https://youtu.be/EmkUQfyvwkY), that by learning to evaluate actions in an abstract image-based representation of the real world, the robot can generalize and adapt to the object shapes in challenging real-world environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02223

PDF

http://arxiv.org/pdf/1904.02223
Read All
Language GANs Falling Short

2019-04-03

Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin

arXiv_CV

arXiv_CV Adversarial GAN Inference Prediction
Abstract

Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, we make several surprising observations which contradict common beliefs. First, we revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model’s conditional distributions. Second, we leverage the control over the quality / diversity trade-off given by this parameter to evaluate models over the whole quality-diversity spectrum and find MLE models constantly outperform the proposed GAN variants over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade-off than adversarial training while being easier to train, easier to cross-validate, and less computationally expensive. Code to reproduce the experiments is available at github.com/pclucas14/GansFallingShort

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.02549

PDF

https://arxiv.org/pdf/1811.02549
Read All
Decomposing Temperature Time Series with Non-Negative Matrix Factorization

2019-04-03

Peter Weiderer, Ana Maria Tomé, Elmar Wolfgang Lang

arXiv_CV

arXiv_CV Knowledge
Abstract

During the fabrication of casting parts sensor data is typically automatically recorded and accumulated for process monitoring and defect diagnosis. As casting is a thermal process with many interacting process parameters, root cause analysis tends to be tedious and ineffective. We show how a decomposition based on non-negative matrix factorization (NMF), which is guided by a knowledge-based initialization strategy, is able to extract physical meaningful sources from temperature time series collected during a thermal manufacturing process. The approach assumes the time series to be generated by a superposition of several simultaneously acting component processes. NMF is able to reverse the superposition and to identify the hidden component processes. The latter can be linked to ongoing physical phenomena and process variables, which cannot be monitored directly. Our approach provides new insights into the underlying physics and offers a tool, which can assist in diagnosing defect causes. We demonstrate our method by applying it to real world data, collected in a foundry during the series production of casting parts for the automobile industry.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02217

PDF

http://arxiv.org/pdf/1904.02217
Read All
DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

2019-04-03

Hanchao Li, Pengfei Xiong, Haoqiang Fan, Jian Sun

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation
Abstract

This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8$\times$ less FLOPs and 2$\times$ faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3\% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a speed of 160 FPS on one NVIDIA Titan X card, and 71.3\% Mean IOU with 3.4 GFLOPs while inferring on a higher resolution image.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02216

PDF

http://arxiv.org/pdf/1904.02216
Read All
Experimental Comparison of Open Source Visual-Inertial-Based State Estimation Algorithms in the Underwater Domain

2019-04-03

Bharat Joshi, Sharmin Rahman, Michail Kalaitzakis, Brennan Cain, James Johnson, Marios Xanthidis, Nare Karapetyan, Alan Hernandez, Alberto Quattrini Li, Nikolaos Vitzilaios, Ioannis Rekleitis

arXiv_RO

arXiv_RO Optimization
Abstract

A plethora of state estimation techniques have appeared in the last decade using visual data, and more recently with added inertial data. Datasets typically used for evaluation include indoor and urban environments, where supporting videos have shown impressive performance. However, such techniques have not been fully evaluated in challenging conditions, such as the marine domain. In this paper, we compare ten recent open-source packages to provide insights on their performance and guidelines on addressing current challenges. Specifically, we selected direct methods and tightly-coupled optimization techniques that fuse camera and Inertial Measurement Unit (IMU) data together. Experiments are conducted by testing all packages on datasets collected over the years with underwater robots in our laboratory. All the datasets are made available online.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02215

PDF

http://arxiv.org/pdf/1904.02215
Read All
Massively Multilingual Adversarial Speech Recognition

2019-04-03

Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky

arXiv_CL

arXiv_CL Adversarial Speech_Recognition Classification Recognition
Abstract

We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additional pretraining objectives in encouraging language-independent encoder representations: a context-independent phoneme objective paired with a language-adversarial classification objective.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02210

PDF

http://arxiv.org/pdf/1904.02210
Read All
The Green Choice: Learning and Influencing Human Decisions on Shared Roads

2019-04-03

Erdem Bıyık, Daniel A. Lazar, Dorsa Sadigh, Ramtin Pedarsani

arXiv_RO

arXiv_RO Optimization
Abstract

Autonomous vehicles have the potential to increase the capacity of roads via platooning, even when human drivers and autonomous vehicles share roads. However, when users of a road network choose their routes selfishly, the resulting traffic configuration may be very inefficient. Because of this, we consider how to influence human decisions so as to decrease congestion on these roads. We consider a network of parallel roads with two modes of transportation: (i) human drivers who will choose the quickest route available to them, and (ii) ride hailing service which provides an array of autonomous vehicle ride options, each with different prices, to users. In this work, we seek to design these prices so that when autonomous service users choose from these options and human drivers selfishly choose their resulting routes, road usage is maximized and transit delay is minimized. To do so, we formalize a model of how autonomous service users make choices between routes with different price/delay values. Developing a preference-based algorithm to learn the preferences of the users, and using a vehicle flow model related to the Fundamental Diagram of Traffic, we formulate a planning optimization to maximize a social objective and demonstrate the benefit of the proposed routing and learning scheme.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02209

PDF

http://arxiv.org/pdf/1904.02209
Read All
Formulating Camera-Adaptive Color Constancy as a Few-shot Meta-Learning Problem

2019-04-03

Steven McDonagh, Sarah Parisot, Fengwei Zhou, Xing Zhang, Ales Leonardis, Zhenguo Li, Gregory Slabaugh

arXiv_CV

arXiv_CV Quantitative
Abstract

Digital camera pipelines employ color constancy methods to estimate an unknown scene illuminant, in order to re-illuminate images as if they were acquired under an achromatic light source. Fully-supervised learning approaches exhibit state-of-the-art estimation accuracy with camera-specific labelled training imagery. Resulting models typically suffer from domain gaps and fail to generalise across imaging devices. In this work, we propose a new approach that affords fast adaptation to previously unseen cameras, and robustness to changes in capture device by leveraging annotated samples across different cameras and datasets. We present a general approach that utilizes the concept of color temperature to frame color constancy as a set of distinct, homogeneous few-shot regression tasks, each associated with an intuitive physical meaning. We integrate this novel formulation within a meta-learning framework, enabling fast generalisation to previously unseen cameras using only handfuls of camera specific training samples. Consequently, the time spent for data collection and annotation substantially diminishes in practice whenever a new sensor is used. To quantify this gain, we evaluate our pipeline on three publicly available datasets comprising 12 different cameras and diverse scene content. Our approach delivers competitive results both qualitatively and quantitatively while requiring a small fraction of the camera-specific samples compared to standard approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.11788

PDF

http://arxiv.org/pdf/1811.11788
Read All
Linearly Converging Quasi Branch and Bound Algorithms for Global Rigid Registration

2019-04-03

Nadav Dym, Shahar Ziv Kovalsky

arXiv_CV

arXiv_CV
Abstract

In recent years, several branch-and-bound (BnB) algorithms have been proposed to globally optimize rigid registration problems. In this paper, we suggest a general framework to improve upon the BnB approach, which we name Quasi BnB. Quasi BnB replaces the linear lower bounds used in BnB algorithms with quadratic quasi-lower bounds which are based on the quadratic behavior of the energy in the vicinity of the global minimum. While quasi-lower bounds are not truly lower bounds, the Quasi-BnB algorithm is globally optimal. In fact we prove that it exhibits linear convergence – it achieves $\epsilon$-accuracy in $~O(\log(1/\epsilon)) $ time while the time complexity of other rigid registration BnB algorithms is polynomial in $1/\epsilon $. Our experiments verify that Quasi-BnB is significantly more efficient than state-of-the-art BnB algorithms, especially for problems where high accuracy is desired.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02204

PDF

http://arxiv.org/pdf/1904.02204
Read All
Semantics-Aware Image to Image Translation and Domain Transfer

2019-04-03

Pravakar Roy, Nicolai Häni, Volkan Isler

arXiv_CV

arXiv_CV Adversarial Segmentation GAN Quantitative
Abstract

Image to image translation is the problem of transferring an image from a source domain to a target domain. We present a new method to transfer the underlying semantics of an image even when there are geometric changes across the two domains. Specifically, we present a Generative Adversarial Network (GAN) that can transfer semantic information presented as segmentation masks. Our main technical contribution is an encoder-decoder based generator architecture that jointly encodes the image and its underlying semantics and translates both simultaneously to the target domain. Additionally, we propose object transfiguration and cross-domain semantic consistency losses that preserve the underlying semantic labels maps. We demonstrate the effectiveness of our approach in multiple object transfiguration and domain transfer tasks through qualitative and quantitative experiments. The results show that our method is better at transferring image semantics than state of the art image to image translation methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02203

PDF

http://arxiv.org/pdf/1904.02203
Read All
Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

2019-04-03

Fréderic Godin, Anjishnu Kumar, Arpit Mittal

arXiv_CL

arXiv_CL Knowledge_Graph Knowledge Reinforcement_Learning
Abstract

In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10236

PDF

http://arxiv.org/pdf/1902.10236
Read All
PaintBot: A Reinforcement Learning Approach for Natural Media Painting

2019-04-03

Biao Jia, Chen Fang, Jonathan Brandt, Byungmoon Kim, Dinesh Manocha

arXiv_CV

arXiv_CV Reinforcement_Learning Optimization
Abstract

We propose a new automated digital painting framework, based on a painting agent trained through reinforcement learning. To synthesize an image, the agent selects a sequence of continuous-valued actions representing primitive painting strokes, which are accumulated on a digital canvas. Action selection is guided by a given reference image, which the agent attempts to replicate subject to the limitations of the action space and the agent’s learned policy. The painting agent policy is determined using a variant of proximal policy optimization reinforcement learning. During training, our agent is presented with patches sampled from an ensemble of reference images. To accelerate training convergence, we adopt a curriculum learning strategy, whereby reference patches are sampled according to how challenging they are using the current policy. We experiment with differing loss functions, including pixel-wise and perceptual loss, which have consequent differing effects on the learned policy. We demonstrate that our painting agent can learn an effective policy with a high dimensional continuous action space comprising pen pressure, width, tilt, and color, for a variety of painting styles. Through a coarse-to-fine refinement process our agent can paint arbitrarily complex images in the desired style.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02201

PDF

http://arxiv.org/pdf/1904.02201
Read All
3D-BEVIS: Birds-Eye-View Instance Segmentation

2019-04-03

Cathrin Elich, Francis Engelmann, Jonas Schult, Theodora Kontogianni, Bastian Leibe

arXiv_CV

arXiv_CV Semantic_Instance_Segmentation Segmentation Embedding Semantic_Segmentation Classification Deep_Learning
Abstract

Recent deep learning models achieve impressive results on 3D scene analysis tasks by operating directly on unstructured point clouds. A lot of progress was made in the field of object classification and semantic segmentation. However, the task of instance segmentation is less explored. In this work, we present 3D-BEVIS, a deep learning framework for 3D semantic instance segmentation on point clouds. Following the idea of previous proposal-free instance segmentation approaches, our model learns a feature embedding and groups the obtained feature space into semantic instances. Current point-based methods scale linearly with the number of points by processing local sub-parts of a scene individually. However, to perform instance segmentation by clustering, globally consistent features are required. Therefore, we propose to combine local point geometry with global context information from an intermediate bird’s-eye view representation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02199

PDF

http://arxiv.org/pdf/1904.02199
Read All
Text normalization using memory augmented neural networks

2019-04-03

Subhojeet Pramanik, Aman Hussain

arXiv_CL

arXiv_CL RNN
Abstract

We perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture that will serve as a language-agnostic text normalization system while avoiding the kind of unacceptable errors made by the LSTM-based recurrent neural networks. By successfully reducing the frequency of such mistakes, we show that this novel architecture is indeed a better alternative. Our proposed system requires significantly lesser amounts of data, training time and compute resources. Additionally, we perform data up-sampling, circumventing the data sparsity problem in some semiotic classes, to show that sufficient examples in any particular class can improve the performance of our text normalization system. Although a few occurrences of these errors still remain in certain semiotic classes, we demonstrate that memory augmented networks with meta-learning capabilities can open many doors to a superior text normalization system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.00044

PDF

http://arxiv.org/pdf/1806.00044
Read All
Understanding the efficacy, reliability and resiliency of computer vision techniques for malware detection and future research directions

2019-04-03

Li Chen

arXiv_CV

arXiv_CV Object_Detection Face Image_Classification Transfer_Learning Classification Detection
Abstract

My research lies in the intersection of security and machine learning. This overview summarizes one component of my research: combining computer vision with malware exploit detection for enhanced security solutions. I will present the perspectives of efficacy, reliability and resiliency to formulate threat detection as computer vision problems and develop state-of-the-art image-based malware classification. Representing malware binary as images provides a direct visualization of data samples, reduces the efforts for feature extraction, and consumes the whole binary for holistic structural analysis. Employing transfer learning of deep neural networks effective for large scale image classification to malware classification demonstrates superior classification efficacy compared with classical machine learning algorithms. To enhance reliability of these vision-based malware detectors, interpretation frameworks can be constructed on the malware visual representations and useful for extracting faithful explanation, so that security practitioners have confidence in the model before deployment. In cyber-security applications, we should always assume that a malware writer constantly modifies code to bypass detection. Addressing the resiliency of the malware detectors is equivalently important as efficacy and reliability. Via understanding the attack surfaces of machine learning models used for malware detection, we can greatly improve the robustness of the algorithms to combat malware adversaries in the wild. Finally I will discuss future research directions worth pursuing in this research community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10504

PDF

http://arxiv.org/pdf/1904.10504
Read All
Probing Biomedical Embeddings from Language Models

2019-04-03

Qiao Jin, Bhuwan Dhingra, William W. Cohen, Xinghua Lu

arXiv_CL

arXiv_CL Embedding Language_Model Relation
Abstract

Contextualized word embeddings derived from pre-trained language models (LMs) show significant improvements on downstream NLP tasks. Pre-training on domain-specific corpora, such as biomedical articles, further improves their performance. In this paper, we conduct probing experiments to determine what additional information is carried intrinsically by the in-domain trained contextualized embeddings. For this we use the pre-trained LMs as fixed feature extractors and restrict the downstream task models to not have additional sequence modeling layers. We compare BERT, ELMo, BioBERT and BioELMo, a biomedical version of ELMo trained on 10M PubMed abstracts. Surprisingly, while fine-tuned BioBERT is better than BioELMo in biomedical NER and NLI tasks, as a fixed feature extractor BioELMo outperforms BioBERT in our probing tasks. We use visualization and nearest neighbor analysis to show that better encoding of entity-type and relational information leads to this superiority.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02181

PDF

http://arxiv.org/pdf/1904.02181
Read All
Constrained Generative Adversarial Networks for Interactive Image Generation

2019-04-03

Eric Heim

arXiv_CV

arXiv_CV Adversarial Attention GAN Relation
Abstract

Generative Adversarial Networks (GANs) have received a great deal of attention due in part to recent success in generating original, high-quality samples from visual domains. However, most current methods only allow for users to guide this image generation process through limited interactions. In this work we develop a novel GAN framework that allows humans to be “in-the-loop” of the image generation process. Our technique iteratively accepts relative constraints of the form “Generate an image more like image A than image B”. After each constraint is given, the user is presented with new outputs from the GAN, informing the next round of feedback. This feedback is used to constrain the output of the GAN with respect to an underlying semantic space that can be designed to model a variety of different notions of similarity (e.g. classes, attributes, object relationships, color, etc.). In our experiments, we show that our GAN framework is able to generate images that are of comparable quality to equivalent unsupervised GANs while satisfying a large number of the constraints provided by users, effectively changing a GAN into one that allows users interactive control over image generation without sacrificing image quality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02526

PDF

http://arxiv.org/pdf/1904.02526
Read All
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders

2019-04-03

Andrew Drozdov, Pat Verga, Mohit Yadav, Mohit Iyyer, Andrew McCallum

arXiv_CL

arXiv_CL
Abstract

We introduce deep inside-outside recursive autoencoders (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. Our approach predicts each word in an input sentence conditioned on the rest of the sentence and uses inside-outside dynamic programming to consider all possible binary trees over the sentence. At test time the CKY algorithm extracts the highest scoring parse. DIORA achieves a new state-of-the-art F1 in unsupervised binary constituency parsing (unlabeled) in two benchmark datasets, WSJ and MultiNLI.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02142

PDF

https://arxiv.org/pdf/1904.02142
Read All

93/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL