Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

360 Panorama Synthesis from a Sparse Set of Images with Unknown FOV

2019-04-06

Julius Surya Sumantri, In Kyu Park

arXiv_CV

arXiv_CV Sparse
Abstract

360 images represent scenes captured in all possible viewing directions. They enable viewers to navigate freely around the scene and thus provide an immersive experience. Conversely, conventional images represent scenes in a single viewing direction. These images are captured with a small or limited field of view. As a result, only some parts of the scenes are observed, and valuable information about the surroundings is lost. We propose a learning-based approach that reconstructs the scene in 360 x180 from conventional images. This approach first estimates the field of view of input images relative to the panorama. The estimated field of view is then used as the prior for synthesizing a high-resolution 360 panoramic output. Experimental results demonstrate that our approach outperforms alternative method and is robust enough to synthesize real-world data (e.g. scenes captured using smartphones).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03326

PDF

http://arxiv.org/pdf/1904.03326
Read All
To Monitor Or Not: Observing Robot's Behavior based on a Game-Theoretic Model of Trust

2019-04-06

Sailik Sengupta, Zahra Zahedi, Subbarao Kambhampati

arXiv_AI

arXiv_AI
Abstract

In scenarios where a robot generates and executes a plan, there may be instances where this generated plan is less costly for the robot to execute but incomprehensible to the human. When the human acts as a supervisor and is held accountable for the robot’s plan, the human may be at a higher risk if the incomprehensible behavior is deemed to be unsafe. In such cases, the robot, who may be unaware of the human’s exact expectations, may choose to do (1) the most constrained plan (i.e. one preferred by all possible supervisors) incurring the added cost of executing highly sub-optimal behavior when the human is observing it and (2) deviate to a more optimal plan when the human looks away. These problems amplify in situations where the robot has to fulfill multiple goals and cater to the needs of different human supervisors. In such settings, the robot, being a rational agent, should take any chance it gets to deviate to a lower cost plan. On the other hand, continuous monitoring of the robot’s behavior is often difficult for human because it costs them valuable resources (e.g., time, effort, cognitive overload, etc.). To optimize the cost for constant monitoring while ensuring the robots follow the {\em safe} behavior, we model this problem in the game-theoretic framework of trust where the human is the agent that trusts the robot. We show that the notion of human’s trust, which is well-defined when there is a pure strategy equilibrium, is inversely proportional to the probability it assigns for observing the robot’s behavior. We then show that with high probability, our game lacks a pure strategy Nash equilibrium, forcing us to define trust boundary over mixed strategies of the human in order to guarantee safe behavior by the robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00111

PDF

http://arxiv.org/pdf/1903.00111
Read All
Publicly Available Clinical BERT Embeddings

2019-04-06

Emily Alsentzer, John R. Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, Matthew B. A. McDermott

arXiv_CL

arXiv_CL Embedding
Abstract

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these models have been minimally explored on specialty corpora, such as clinical text; moreover, in the clinical domain, no publicly-available pre-trained BERT models yet exist. In this work, we address this need by exploring and releasing BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically. We demonstrate that using a domain-specific model yields performance improvements on three common clinical NLP tasks as compared to nonspecific embeddings. These domain-specific models are not as performant on two clinical de-identification tasks, and argue that this is a natural consequence of the differences between de-identified source text and synthetically non de-identified task text.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03323

PDF

http://arxiv.org/pdf/1904.03323
Read All
Gender Bias in Contextualized Word Embeddings

2019-04-05

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang

arXiv_CL

arXiv_CL Embedding
Abstract

In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo’s contextualized word vectors. First, we conduct several intrinsic analyses and find that (1) training data for ELMo contains significantly more male than female entities, (2) the trained ELMo embeddings systematically encode gender information and (3) ELMo unequally encodes gender information about male and female entities. Then, we show that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus. Finally, we explore two methods to mitigate such gender bias and show that the bias demonstrated on WinoBias can be eliminated.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03310

PDF

http://arxiv.org/pdf/1904.03310
Read All
Convolutional Relational Machine for Group Activity Recognition

2019-04-05

Sina Mokhtarzadeh Azar, Mina Ghadimi Atigh, Ahmad Nickabadi, Alexandre Alahi

arXiv_CV

arXiv_CV CNN Prediction Relation Recognition
Abstract

We present an end-to-end deep Convolutional Neural Network called Convolutional Relational Machine (CRM) for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video. It learns to produce an intermediate spatial representation (activity map) based on individual and group activities. A multi-stage refinement component is responsible for decreasing the incorrect predictions in the activity map. Finally, an aggregation component uses the refined information to recognize group activities. Experimental results demonstrate the constructive contribution of the information extracted and represented in the form of the activity map. CRM shows advantages over state-of-the-art models on Volleyball and Collective Activity datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03308

PDF

http://arxiv.org/pdf/1904.03308
Read All
Effective Context and Fragment Feature Usage for Named Entity Recognition

2019-04-05

Nargiza Nosirova, Mingbin Xu, Hui Jiang

arXiv_CL

arXiv_CL GAN Embedding Recognition
Abstract

In this paper, we explore a new approach to named entity recognition (NER) with the goal of learning from context and fragment features more effectively, contributing to the improvement of overall recognition performance. We use the recent fixed-size ordinally forgetting encoding (FOFE) method to fully encode each sentence fragment and its left-right contexts into a fixed-size representation. Next, we organize the context and fragment features into groups, and feed each feature group to dedicated fully-connected layers. Finally, we merge each group’s final dedicated layers and add a shared layer leading to a single output. The outcome of our experiments show that, given only tokenized text and trained word embeddings, our system outperforms our baseline models, and is competitive to the state-of-the-arts of various well-known NER tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03305

PDF

http://arxiv.org/pdf/1904.03305
Read All
Factorization of Discriminatively Trained i-vector Extractor for Speaker Recognition

2019-04-05

Ondrej Novotny, Oldrich Plchot, Ondrej Glembek, Lukas Burget

arXiv_SD

arXiv_SD Recognition
Abstract

In this work, we continue in our research on i-vector extractor for speaker verification (SV) and we optimize its architecture for fast and effective discriminative training. We were motivated by computational and memory requirements caused by the large number of parameters of the original generative i-vector model. Our aim is to preserve the power of the original generative model, and at the same time focus the model towards extraction of speaker-related information. We show that it is possible to represent a standard generative i-vector extractor by a model with significantly less parameters and obtain similar performance on SV tasks. We can further refine this compact model by discriminative training and obtain i-vectors that lead to better performance on various SV benchmarks representing different acoustic domains.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04235

PDF

http://arxiv.org/pdf/1904.04235
Read All
Revealing Scenes by Inverting Structure from Motion Reconstructions

2019-04-05

Francesco Pittaluga, Sanjeev J. Koppal, Sing Bing Kang, Sudipta N. Sinha

arXiv_CV

arXiv_CV Sparse
Abstract

Many 3D vision systems localize cameras within a scene using 3D point clouds. Such point clouds are often obtained using structure from motion (SfM), after which the images are discarded to preserve privacy. In this paper, we show, for the first time, that such point clouds retain enough information to reveal scene appearance and compromise privacy. We present a privacy attack that reconstructs color images of the scene from the point cloud. Our method is based on a cascaded U-Net that takes as input, a 2D multichannel image of the points rendered from a specific viewpoint containing point depth and optionally color and SIFT descriptors and outputs a color image of the scene from that viewpoint. Unlike previous feature inversion methods, we deal with highly sparse and irregular 2D point distributions and inputs where many point attributes are missing, namely keypoint orientation and scale, the descriptor image source and the 3D point visibility. We evaluate our attack algorithm on public datasets and analyze the significance of the point cloud attributes. Finally, we show that novel views can also be generated thereby enabling compelling virtual tours of the underlying scene.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03303

PDF

http://arxiv.org/pdf/1904.03303
Read All
Measuring scheduling efficiency of RNNs for NLP applications

2019-04-05

Urmish Thakker, Ganesh Dasika, Jesse Beu, Matthew Mattina

arXiv_CV

arXiv_CV Image_Caption Speech_Recognition Caption Optimization Inference RNN Recognition
Abstract

Recurrent neural networks (RNNs) have shown state of the art results for speech recognition, natural language processing, image captioning and video summarizing applications. Many of these applications run on low-power platforms, so their energy efficiency is extremely important. We observed that cache-oblivious RNN scheduling during inference typically results in 30-50x more data transferred on and off the CPU than the application’s working set size. This can potentially impact its energy efficiency. This paper presents a new metric called Data Reuse Efficiency to gauge the RNN scheduling efficiency of a platform and shows the factors that influence the DRE value. Additionally, this paper discusses an optimization to improve reuse in RNNs and highlights the positive impact of this optimization on the total amount of memory read from or written to the memory controller (and, hence, the DRE value) during the execution of an RNN application for a mobile SoC.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03302

PDF

https://arxiv.org/pdf/1904.03302
Read All
A Multi-task Learning Approach for Named Entity Recognition using Local Detection

2019-04-05

Nargiza Nosirova, Mingbin Xu, Hui Jiang

arXiv_CL

arXiv_CL Detection Relation Recognition
Abstract

Named entity recognition (NER) systems that perform well require task-related and manually annotated datasets. However, they are expensive to develop, and are thus limited in size. As there already exists a large number of NER datasets that share a certain degree of relationship but differ in content, it is important to explore the question of whether such datasets can be combined as a simple method for improving NER performance. To investigate this, we developed a novel locally detecting multitask model using FFNNs. The model relies on encoding variable-length sequences of words into theoretically lossless and unique fixed-size representations. We applied this method to several well-known NER tasks and compared the results of our model to baseline models as well as other published results. As a result, we observed competitive performance in nearly all of the tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03300

PDF

http://arxiv.org/pdf/1904.03300
Read All
A General Framework for Information Extraction using Dynamic Span Graphs

2019-04-05

Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, Hannaneh Hajishirzi

arXiv_CL

arXiv_CL RNN Relation
Abstract

We introduce a general framework for several information extraction tasks that share span representations using dynamically constructed span graphs. The graphs are constructed by selecting the most confident entity spans and linking these nodes with confidence-weighted relation types and coreferences. The dynamic span graph allows coreference and relation type confidences to propagate through the graph to iteratively refine the span representations. This is unlike previous multi-task frameworks for information extraction in which the only interaction between tasks is in the shared first-layer LSTM. Our framework significantly outperforms the state-of-the-art on multiple information extraction tasks across multiple datasets reflecting different domains. We further observe that the span enumeration approach is good at detecting nested span entities, with significant F1 score improvement on the ACE dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03296

PDF

http://arxiv.org/pdf/1904.03296
Read All
Multi-Preference Actor Critic

2019-04-05

Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine

arXiv_AI

arXiv_AI Reinforcement_Learning Gradient_Descent
Abstract

Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates. However, for most Reinforcement Learning tasks, humans can provide additional insight to constrain the policy learning. We introduce a general method to incorporate multiple different feedback channels into a single policy gradient loss. In our formulation, the Multi-Preference Actor Critic (M-PAC), these different types of feedback are implemented as constraints on the policy. We use a Lagrangian relaxation to satisfy these constraints using gradient descent while learning a policy that maximizes rewards. Experiments in Atari and Pendulum verify that constraints are being respected and can accelerate the learning process.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03295

PDF

http://arxiv.org/pdf/1904.03295
Read All
Hypernetwork functional image representation

2019-04-05

Sylwester Klocek, Łukasz Maziarka, Maciej Wołczyk, Jacek Tabor, Jakub Nowak, Marek Śmieja

arXiv_CV

arXiv_CV Image_Caption Super_Resolution
Abstract

We use a hypernetwork to automatically generate continuous functional representation of images at test time without any additional training. More precisely, the hypernetwork takes an image and returns weights to a target network representing the image. Since obtained representation is continuous, we can easily inspect the image at various resolutions. Finally, because we use a single hypernetwork responsible for creating individual image models, similar images have similar weights of their target networks. As a consequence, interpolation in the space of weights of target networks representing images shows properties similar to that of generative models. To experimentally evaluate the proposed mechanism, we apply it to image super-resolution. Despite of using a single model for various scale factors, we obtained the results comparable to existing super-resolution methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10404

PDF

http://arxiv.org/pdf/1902.10404
Read All
In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations

2019-04-05

Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Gerard Pons-Moll, Christian Theobalt

arXiv_CV

arXiv_CV Pose_Estimation CNN Deep_Learning
Abstract

Convolutional Neural Network based approaches for monocular 3D human pose estimation usually require a large amount of training images with 3D pose annotations. While it is feasible to provide 2D joint annotations for large corpora of in-the-wild images with humans, providing accurate 3D annotations to such in-the-wild corpora is hardly feasible in practice. Most existing 3D labelled data sets are either synthetically created or feature in-studio images. 3D pose estimation algorithms trained on such data often have limited ability to generalize to real world scene diversity. We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes. It has a network architecture that comprises a new disentangled hidden space encoding of explicit 2D and 3D features, and uses supervision by a new learned projection model from predicted 3D pose. Our algorithm can be jointly trained on image data with 3D labels and image data with only 2D labels. It achieves state-of-the-art accuracy on challenging in-the-wild data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03289

PDF

http://arxiv.org/pdf/1904.03289
Read All
Jasper: An End-to-End Convolutional Neural Acoustic Model

2019-04-05

Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

arXiv_CL

arXiv_CL Speech_Recognition CNN Language_Model Recognition
Abstract

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. To improve training, we further introduce a new layer-wise optimizer called NovoGrad. Through experiments, we demonstrate that the proposed deep architecture performs as well or better than more complex choices. Our deepest Jasper variant uses 54 convolutional layers. With this architecture, we achieve 2.95% WER using beam-search decoder with an external neural language model and 3.86% WER with a greedy decoder on LibriSpeech test-clean. We also report competitive results on the Wall Street Journal and the Hub5’00 conversational evaluation datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03288

PDF

http://arxiv.org/pdf/1904.03288
Read All
Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness

2019-04-05

Arijit Ray, Giedrius Burachas, Yi Yao, Ajay Divakaran

arXiv_CV

arXiv_CV QA VQA
Abstract

While there have been many proposals on how to make AI algorithms more transparent, few have attempted to evaluate the impact of AI explanations on human performance on a task using AI. We propose a Twenty-Questions style collaborative image guessing game, Explanation-assisted Guess Which (ExAG) as a method of evaluating the efficacy of explanations in the context of Visual Question Answering (VQA) - the task of answering natural language questions on images. We study the effect of VQA agent explanations on the game performance as a function of explanation type and quality. We observe that “effective” explanations are not only conducive to game performance (by almost 22% for “excellent” rated explanations), but also helpful when VQA system answers are erroneous or noisy (by almost 30% compared to no explanations). We also see that players develop a preference for explanations even when penalized and that the explanations are mostly rated as “helpful”.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03285

PDF

http://arxiv.org/pdf/1904.03285
Read All
Weakly Supervised Video Moment Retrieval From Text Queries

2019-04-05

Niluthpol Chowdhury Mithun, Sujoy Paul, Amit K. Roy-Chowdhury

arXiv_CV

arXiv_CV Attention Weakly_Supervised Embedding
Abstract

There have been a few recent methods proposed in text to video moment retrieval using natural language queries, but requiring full supervision during training. However, acquiring a large number of training videos with temporal boundary annotations for each text description is extremely time-consuming and often not scalable. In order to cope with this issue, in this work, we introduce the problem of learning from weak labels for the task of text to video moment retrieval. The weak nature of the supervision is because, during training, we only have access to the video-text pairs rather than the temporal extent of the video to which different text descriptions relate. We propose a joint visual-semantic embedding based framework that learns the notion of relevant segments from video using only video-level sentence descriptions. Specifically, our main idea is to utilize latent alignment between video frames and sentence descriptions using Text-Guided Attention (TGA). TGA is then used during the test phase to retrieve relevant moments. Experiments on two benchmark datasets demonstrate that our method achieves comparable performance to state-of-the-art fully supervised approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03282

PDF

http://arxiv.org/pdf/1904.03282
Read All
Prediction-Tracking-Segmentation

2019-04-05

Jianren Wang, Yihui He, Xiaobo Wang, Xinjia Yu, Xia Chen

arXiv_CV

arXiv_CV Segmentation Tracking Prediction
Abstract

We introduce a prediction driven method for visual tracking and segmentation in videos. Instead of solely relying on matching with appearance cues for tracking, we build a predictive model which guides finding more accurate tracking regions efficiently. With the proposed prediction mechanism, we improve the model robustness against distractions and occlusions during tracking. We demonstrate significant improvements over state-of-the-art methods not only on visual tracking tasks (VOT 2016 and VOT 2018) but also on video segmentation datasets (DAVIS 2016 and DAVIS 2017).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03280

PDF

http://arxiv.org/pdf/1904.03280
Read All
Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems

2019-04-05

Ashwini Challah, Kartikeya Upasani, Anusha Balakrishnan, Rajen Subba

arXiv_CL

arXiv_CL Classification
Abstract

Neural approaches to Natural Language Generation (NLG) have been promising for goal-oriented dialogue. One of the challenges of productionizing these approaches, however, is the ability to control response quality, and ensure that generated responses are acceptable. We propose the use of a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response. While acceptability includes grammatical correctness and semantic correctness, we focus only on grammaticality classification in this paper, and show that existing datasets for grammatical error correction don’t correctly capture the distribution of errors that data-driven generators are likely to make. We release a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems. We then explore two supervised learning approaches (CNNs and GBDTs) for classifying grammaticality. Our experiments show that grammaticality classification is very sensitive to the distribution of errors in the data, and that these distributions vary significantly with both the source of the response as well as the domain. We show that it’s possible to achieve high precision with reasonable recall on our dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03279

PDF

http://arxiv.org/pdf/1904.03279
Read All
AMASS: Archive of Motion Capture as Surface Shapes

2019-04-05

Naureen Mahmood (Meshcapade GmbH), Nima Ghorbani (MPI for Intelligent Systems), Nikolaus F. Troje (York University), Gerard Pons-Moll (MPI for Informatics), Michael J. Black (MPI for Intelligent Systems)

arXiv_CV

arXiv_CV Face Deep_Learning
Abstract

Large datasets are the cornerstone of recent advances in computer vision using deep learning. In contrast, existing human motion capture (mocap) datasets are small and the motions limited, hampering progress on learning models of human motion. While there are many different datasets available, they each use a different parameterization of the body, making it difficult to integrate them into a single meta dataset. To address this, we introduce AMASS, a large and varied database of human motion that unifies 15 different optical marker-based mocap datasets by representing them within a common framework and parameterization. We achieve this using a new method, MoSh++, that converts mocap data into realistic 3D human meshes represented by a rigged body model; here we use SMPL [doi:10.1145/2816795.2818013], which is widely used and provides a standard skeletal representation as well as a fully rigged surface mesh. The method works for arbitrary marker sets, while recovering soft-tissue dynamics and realistic hand motion. We evaluate MoSh++ and tune its hyperparameters using a new dataset of 4D body scans that are jointly recorded with marker-based mocap. The consistent representation of AMASS makes it readily useful for animation, visualization, and generating training data for deep learning. Our dataset is significantly richer than previous human motion collections, having more than 40 hours of motion data, spanning over 300 subjects, more than 11,000 motions, and will be publicly available to the research community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03278

PDF

http://arxiv.org/pdf/1904.03278
Read All
A Variational Auto-Encoder Model for Stochastic Point Processes

2019-04-05

Nazanin Mehrasa, Akash Abdu Jyothi, Thibaut Durand, Jiawei He, Leonid Sigal, Greg Mori

arXiv_CV

arXiv_CV
Abstract

We propose a novel probabilistic generative model for action sequences. The model is termed the Action Point Process VAE (APP-VAE), a variational auto-encoder that can capture the distribution over the times and categories of action sequences. Modeling the variety of possible action sequences is a challenge, which we show can be addressed via the APP-VAE’s use of latent representations and non-linear functions to parameterize distributions over which event is likely to occur next in a sequence and at what time. We empirically validate the efficacy of APP-VAE for modeling action sequences on the MultiTHUMOS and Breakfast datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03273

PDF

http://arxiv.org/pdf/1904.03273
Read All
Domain Authoring Assistant for Intelligent Virtual Agents

2019-04-05

Sepehr Janghorbani, Ashutosh Modi, Jakob Buhmann, Mubbasir Kapadia

arXiv_AI

arXiv_AI Knowledge Attention
Abstract

Developing intelligent virtual characters has attracted a lot of attention in the recent years. The process of creating such characters often involves a team of creative authors who describe different aspects of the characters in natural language, and planning experts that translate this description into a planning domain. This can be quite challenging as the team of creative authors should diligently define every aspect of the character especially if it contains complex human-like behavior. Also a team of engineers has to manually translate the natural language description of a character’s personality into the planning domain knowledge. This can be extremely time and resource demanding and can be an obstacle to author’s creativity. The goal of this paper is to introduce an authoring assistant tool to automate the process of domain generation from natural language description of virtual characters, thus bridging between the creative authoring team and the planning domain experts. Moreover, the proposed tool also identifies possible missing information in the domain description and iteratively makes suggestions to the author.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03266

PDF

http://arxiv.org/pdf/1904.03266
Read All
Extracting Factual Min/Max Age Information from Clinical Trial Studies

2019-04-05

Yufang Hou, Debasis Ganguly, Lea A. Deleris, Francesca Bonin

arXiv_CL

arXiv_CL QA
Abstract

Population age information is an essential characteristic of clinical trials. In this paper, we focus on extracting minimum and maximum (min/max) age values for the study samples from clinical research articles. Specifically, we investigate the use of a neural network model for question answering to address this information extraction task. The min/max age QA model is trained on the massive structured clinical study records from ClinicalTrials.gov. For each article, based on multiple min and max age values extracted from the QA model, we predict both actual min/max age values for the study samples and filter out non-factual age expressions. Our system improves the results over (i) a passage retrieval based IE system and (ii) a CRF-based system by a large margin when evaluated on an annotated dataset consisting of 50 research papers on smoking cessation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03262

PDF

http://arxiv.org/pdf/1904.03262
Read All
Pixels to Plans: Learning Non-Prehensile Manipulation by Imitating a Planner

2019-04-05

Tarik Tosun, Eric Mitchell, Ben Eisner, Jinwook Huh, Bhoram Lee, Daewon Lee, Volkan Isler, H. Sebastian Seung, Daniel Lee

arXiv_RO

arXiv_RO Sparse Deep_Learning
Abstract

We present a novel method enabling robots to quickly learn to manipulate objects by leveraging a motion planner to generate “expert” training trajectories from a small amount of human-labeled data. In contrast to the traditional sense-plan-act cycle, we propose a deep learning architecture and training regimen called PtPNet that can estimate effective end-effector trajectories for manipulation directly from a single RGB-D image of an object. Additionally, we present a data collection and augmentation pipeline that enables the automatic generation of large numbers (millions) of training image and trajectory examples with almost no human labeling effort. We demonstrate our approach in a non-prehensile tool-based manipulation task, specifically picking up shoes with a hook. In hardware experiments, PtPNet generates motion plans (open-loop trajectories) that reliably (89% success over 189 trials) pick up four very different shoes from a range of positions and orientations, and reliably picks up a shoe it has never seen before. Compared with a traditional sense-plan-act paradigm, our system has the advantages of operating on sparse information (single RGB-D frame), producing high-quality trajectories much faster than the “expert” planner (300ms versus several seconds), and generalizing effectively to previously unseen shoes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03260

PDF

http://arxiv.org/pdf/1904.03260
Read All
Is 'Unsupervised Learning' a Misconceived Term?

2019-04-05

Stephen G. Odaibo

arXiv_AI

arXiv_AI Inference
Abstract

Is all of machine learning supervised to some degree? The field of machine learning has traditionally been categorized pedagogically into $supervised~vs~unsupervised~learning$; where supervised learning has typically referred to learning from labeled data, while unsupervised learning has typically referred to learning from unlabeled data. In this paper, we assert that all machine learning is in fact supervised to some degree, and that the scope of supervision is necessarily commensurate to the scope of learning potential. In particular, we argue that clustering algorithms such as k-means, and dimensionality reduction algorithms such as principal component analysis, variational autoencoders, and deep belief networks are each internally supervised by the data itself to learn their respective representations of its features. Furthermore, these algorithms are not capable of external inference until their respective outputs (clusters, principal components, or representation codes) have been identified and externally labeled in effect. As such, they do not suffice as examples of unsupervised learning. We propose that the categorization `supervised vs unsupervised learning’ be dispensed with, and instead, learning algorithms be categorized as either $internally~or~externally~supervised$ (or both). We believe this change in perspective will yield new fundamental insights into the structure and character of data and of learning algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03259

PDF

http://arxiv.org/pdf/1904.03259
Read All
Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles

2019-04-05

Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

arXiv_CL

arXiv_CL Embedding
Abstract

We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available. Unlike previous work that presumes the availability of supervised features such as lemmas, part-of-speech tags, and dependency parse trees, we only make use of word and character features. Our deep model considers using character-based representations as well as unsupervised stem embeddings to alleviate the need for supervised features. Our experiments outperform a state-of-the-art method that uses supervised lexico-syntactic features on 6 out of 7 languages in the Universal Proposition Bank.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03256

PDF

http://arxiv.org/pdf/1904.03256
Read All
Paying More Attention to Motion: Attention Distillation for Learning Video Representations

2019-04-05

Miao Liu, Xin Chen, Yun Zhang, Yin Li, James M. Rehg

arXiv_CV

arXiv_CV Attention Recognition
Abstract

We address the challenging problem of learning motion representations using deep models for video recognition. To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition. Specifically, we propose to leverage output attention maps as a vehicle to transfer the learned representation from a motion (flow) network to an RGB network. We systematically study the design of attention modules, and develop a novel method for attention distillation. Our method is evaluated on major action benchmarks, and consistently improves the performance of the baseline RGB network by a significant margin. Moreover, we demonstrate that our attention maps can leverage motion cues in learning to identify the location of actions in video frames. We believe our method provides a step towards learning motion-aware representations in deep models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03249

PDF

http://arxiv.org/pdf/1904.03249
Read All
Strong-Weak Distribution Alignment for Adaptive Object Detection

2019-04-05

Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko

arXiv_CV

arXiv_CV Adversarial Object_Detection Detection
Abstract

We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. Recently, approaches that align distributions of source and target images using an adversarial loss have been proven effective for adapting object classifiers. However, for object detection, fully matching the entire distributions of source and target images to each other at the global image level may fail, as domains could have distinct scene layouts and different combinations of objects. On the other hand, strong matching of local features such as texture and color makes sense, as it does not change category level semantics. This motivates us to propose a novel method for detector adaptation based on strong local alignment and weak global alignment. Our key contribution is the weak alignment model, which focuses the adversarial alignment loss on images that are globally similar and puts less emphasis on aligning images that are globally dissimilar. Additionally, we design the strong domain alignment model to only look at local receptive fields of the feature map. We empirically verify the effectiveness of our method on four datasets comprising both large and small domain shifts. Our code is available at \url{this https URL}

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.04798

PDF

https://arxiv.org/pdf/1812.04798
Read All
An Analysis of Attention over Clinical Notes for Predictive Tasks

2019-04-05

Sarthak Jain, Ramin Mohammadi, Byron C. Wallace

arXiv_AI

arXiv_AI Attention Prediction
Abstract

The shift to electronic medical records (EMRs) has engendered research into machine learning and natural language technologies to analyze patient records, and to predict from these clinical outcomes of interest. Two observations motivate our aims here. First, unstructured notes contained within EMR often contain key information, and hence should be exploited by models. Second, while strong predictive performance is important, interpretability of models is perhaps equally so for applications in this domain. Together, these points suggest that neural models for EMR may benefit from incorporation of attention over notes, which one may hope will both yield performance gains and afford transparency in predictions. In this work we perform experiments to explore this question using two EMR corpora and four different predictive tasks, that: (i) inclusion of attention mechanisms is critical for neural encoder modules that operate over notes fields in order to yield competitive performance, but, (ii) unfortunately, while these boost predictive performance, it is decidedly less clear whether they provide meaningful support for predictions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03244

PDF

http://arxiv.org/pdf/1904.03244
Read All
HOList: An Environment for Machine Learning of Higher-Order Theorem Proving

2019-04-05

Kshitij Bansal, Sarah M. Loos, Markus N. Rabe, Christian Szegedy, Stewart Wilcox

arXiv_AI

arXiv_AI Reinforcement_Learning Deep_Learning
Abstract

We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement learning environment. HOL Light comes with a broad coverage of basic mathematical theorems on calculus and the formal proof of the Kepler conjecture, from which we derive a challenging benchmark for automated reasoning. We also present a deep reinforcement learning driven automated theorem prover, DeepHOL, with strong initial results on this benchmark.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03241

PDF

http://arxiv.org/pdf/1904.03241
Read All
An Unsupervised Autoregressive Model for Speech Representation Learning

2019-04-05

Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass

arXiv_CL

arXiv_CL Face Represenation_Learning Classification
Abstract

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is designed to preserve information for a wide range of downstream tasks. In addition, the proposed model does not require any phonetic or word boundary labels, allowing the model to benefit from large quantities of unlabeled data. Speech representations learned by our model significantly improve performance on both phone classification and speaker verification over the surface features and other supervised and unsupervised approaches. Further analysis shows that different levels of speech information are captured by our model at different layers. In particular, the lower layers tend to be more discriminative for speakers, while the upper layers provide more phonetic content.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03240

PDF

http://arxiv.org/pdf/1904.03240
Read All
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

2019-04-05

Weicheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

arXiv_CV

arXiv_CV Segmentation Embedding Inference Prediction Detection
Abstract

Instance segmentation aims to detect and segment individual objects in a scene. Most existing methods rely on precise mask annotations of every category. However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required. We introduce ShapeMask, which learns the intermediate concept of object shape to address the problem of generalization in instance segmentation to novel categories. ShapeMask starts with a bounding box detection and gradually refines it by first estimating the shape of the detected object through a collection of shape priors. Next, ShapeMask refines the coarse shape into an instance level mask by learning instance embeddings. The shape priors provide a strong cue for object-like prediction, and the instance embeddings model the instance specific appearance information. ShapeMask significantly outperforms the state-of-the-art by 6.4 and 3.8 AP when learning across categories, and obtains competitive performance in the fully supervised setting. It is also robust to inaccurate detections, decreased model capacity, and small training data. Moreover, it runs efficiently with 150ms inference time and trains within 11 hours on TPUs. With a larger backbone model, ShapeMask increases the gap with state-of-the-art to 9.4 and 6.2 AP across categories. Code will be released.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03239

PDF

http://arxiv.org/pdf/1904.03239
Read All
LSTMs Exploit Linguistic Attributes of Data

2019-04-05

Nelson F. Liu, Omer Levy, Roy Schwartz, Chenhao Tan, Noah A. Smith

arXiv_CL

arXiv_CL RNN
Abstract

While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. We investigate how the properties of natural language data affect an LSTM’s ability to learn a nonlinguistic task: recalling elements from its input. We find that models trained on natural language data are able to recall tokens from much longer sequences than models trained on non-language sequential data. Furthermore, we show that the LSTM learns to solve the memorization task by explicitly using a subset of its neurons to count timesteps in the input. We hypothesize that the patterns and structure in natural language data enable LSTMs to learn by providing approximate ways of reducing loss, but understanding the effect of different training data on the learnability of LSTMs remains an open question.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.11653

PDF

http://arxiv.org/pdf/1805.11653
Read All
Self-organized Collective Motion with a Simulated Real Robot Swarm

2019-04-05

Mohsen Raoufi, Ali Emre Turgut, Farshad Arvin

arXiv_RO

arXiv_RO Knowledge Attention GAN Optimization
Abstract

Collective motion is one of the most fascinating phenomena observed in the nature. In the last decade, it aroused so much attention in physics, control and robotics fields. In particular, many studies have been done in swarm robotics related to collective motion, also called flocking. In most of these studies, robots use orientation and proximity of their neighbors to achieve collective motion. In such an approach, one of the biggest problems is to measure orientation information using on-board sensors. In most of the studies, this information is either simulated or implemented using communication. In this paper, to the best of our knowledge, we implemented a fully autonomous coordinated motion without alignment using very simple Mona robots. We used an approach based on Active Elastic Sheet (AES) method. We modified the method and added the capability to enable the swarm to move toward a desired direction and rotate about an arbitrary point. The parameters of the modified method are optimized using TCACS optimization algorithm. We tested our approach in different settings using Matlab and Webots.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03230

PDF

http://arxiv.org/pdf/1904.03230
Read All
Distinguishing Clinical Sentiment: The Importance of Domain Adaptation in Psychiatric Patient Health Records

2019-04-05

Eben Holderness, Philip Cawkwell, Kirsten Bolton, James Pustejovsky, Mei-Hua Hall

arXiv_CL

arXiv_CL Salient Sentiment Knowledge
Abstract

Recently natural language processing (NLP) tools have been developed to identify and extract salient risk indicators in electronic health records (EHRs). Sentiment analysis, although widely used in non-medical areas for improving decision making, has been studied minimally in the clinical setting. In this study, we undertook, to our knowledge, the first domain adaptation of sentiment analysis to psychiatric EHRs by defining psychiatric clinical sentiment, performing an annotation project, and evaluating multiple sentence-level sentiment machine learning (ML) models. Results indicate that off-the-shelf sentiment analysis tools fail in identifying clinically positive or negative polarity, and that the definition of clinical sentiment that we provide is learnable with relatively small amounts of training data. This project is an initial step towards further refining sentiment analysis methods for clinical use. Our long-term objective is to incorporate the results of this project as part of a machine learning model that predicts inpatient readmission risk. We hope that this work will initiate a discussion concerning domain adaptation of sentiment analysis to the clinical setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03225

PDF

http://arxiv.org/pdf/1904.03225
Read All
NELEC at SemEval-2019 Task 3: Think Twice Before Going Deep

2019-04-05

Parag Agrawal, Anshuman Suri

arXiv_CL

arXiv_CL Sentiment GAN Sentiment_Classification Classification Detection
Abstract

Existing Machine Learning techniques yield close to human performance on text-based classification tasks. However, the presence of multi-modal noise in chat data such as emoticons, slang, spelling mistakes, code-mixed data, etc. makes existing deep-learning solutions perform poorly. The inability of deep-learning systems to robustly capture these covariates puts a cap on their performance. We propose NELEC: Neural and Lexical Combiner, a system which elegantly combines textual and deep-learning based methods for sentiment classification. We evaluate our system as part of the third task of ‘Contextual Emotion Detection in Text’ as part of SemEval-2019. Our system performs significantly better than the baseline, as well as our deep-learning model benchmarks. It achieved a micro-averaged F1 score of 0.7765, ranking 3rd on the test-set leader-board. Our code is available at https://github.com/iamgroot42/nelec

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03223

PDF

http://arxiv.org/pdf/1904.03223
Read All
The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation

2019-04-05

Hermann Blum, Paul-Edouard Sarlin, Juan Nieto, Roland Siegwart, Cesar Cadena

arXiv_CV

arXiv_CV Segmentation Embedding Semantic_Segmentation Classification Deep_Learning Detection
Abstract

Deep learning has enabled impressive progress in the accuracy of semantic segmentation. Yet, the ability to estimate uncertainty and detect failure is key for safety-critical applications like autonomous driving. Existing uncertainty estimates have mostly been evaluated on simple tasks, and it is unclear whether these methods generalize to more complex scenarios. We present Fishyscapes, the first public benchmark for uncertainty estimation in a real-world task of semantic segmentation for urban driving. It evaluates pixel-wise uncertainty estimates and covers the detection of both out-of-distribution objects and misclassifications. We adapt state-of-the-art methods to recent semantic segmentation models and compare approaches based on softmax confidence, Bayesian learning, and embedding density. A thorough evaluation of these methods reveals a clear gap to their alleged capabilities. Our results show that failure detection is far from solved even for ordinary situations, while our benchmark allows measuring advancements beyond the state-of-the-art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03215

PDF

http://arxiv.org/pdf/1904.03215
Read All
Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval

2019-04-05

Qing Liu, Lingxi Xie, Huiyu Wang, Alan Yuille

arXiv_CV

arXiv_CV Image_Retrieval Knowledge Embedding Relation
Abstract

Sketch-based image retrieval (SBIR) is widely recognized as an important vision problem which implies a wide range of real-world applications. Recently, research interests arise in solving this problem under the more realistic and challenging setting of zero-shot learning. In this paper, we investigate this problem from the viewpoint of domain adaptation which we show is critical in improving feature embedding in the zero-shot scenario. Based on a framework which starts with a pre-trained model on ImageNet and fine-tunes it on the training set of SBIR benchmark, we advocate the importance of preserving previously acquired knowledge, e.g., the rich discriminative features learned from ImageNet, so as to improve the model’s transfer ability. For this purpose, we design an approach named Semantic-Aware Knowledge prEservation (SAKE), which fine-tunes the pre-trained model in an economical way and leverages semantic information, e.g., inter-class relationship, to achieve the goal of knowledge preservation. Zero-shot experiments on two extended SBIR datasets, TU-Berlin and Sketchy, verify the superior performance of our approach. Extensive diagnostic experiments validate that knowledge preserved benefits SBIR in zero-shot settings, as a large fraction of the performance gain is from the more properly structured feature embedding for photo images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03208

PDF

http://arxiv.org/pdf/1904.03208
Read All
Detecting Human-Object Interactions via Functional Generalization

2019-04-05

Ankan Bansal, Sai Saketh Rambhatla, Abhinav Shrivastava, Rama Chellappa

arXiv_CV

arXiv_CV Object_Detection Knowledge Detection
Abstract

We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner. The proposed model is simple and uses the visual features of the human, relative spatial orientation of the human and the object, and the knowledge that functionally similar objects take part in similar interactions with humans. We provide extensive experimental validation for our approach and demonstrate state-of-the-art results for HOI detection. On the HICO-Det dataset our method achieves a gain of over 7% absolute points in mean average precision (mAP) over published literature and even a gain of over 2.5% absolute mAP over contemporary work. We also show that our approach leads to significant performance gains for zero-shot HOI detection in the seen object setting. We further demonstrate that using a generic object detector, our model can generalize to interactions involving previously unseen objects.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03181

PDF

https://arxiv.org/pdf/1904.03181
Read All
Reducing catastrophic forgetting when evolving neural networks

2019-04-05

Joseph Early

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

A key stepping stone in the development of an artificial general intelligence (a machine that can perform any task), is the production of agents that can perform multiple tasks at once instead of just one. Unfortunately, canonical methods are very prone to catastrophic forgetting (CF) - the act of overwriting previous knowledge about a task when learning a new task. Recent efforts have developed techniques for overcoming CF in learning systems, but no attempt has been made to apply these new techniques to evolutionary systems. This research presents a novel technique, weight protection, for reducing CF in evolutionary systems by adapting a method from learning systems. It is used in conjunction with other evolutionary approaches for overcoming CF and is shown to be effective at alleviating CF when applied to a suite of reinforcement learning tasks. It is speculated that this work could indicate the potential for a wider application of existing learning-based approaches to evolutionary systems and that evolutionary techniques may be competitive with or better than learning systems when it comes to reducing CF.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03178

PDF

https://arxiv.org/pdf/1904.03178
Read All
Structured agents for physical construction

2019-04-05

Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Physical construction – the ability to compose objects, subject to physical dynamics, in order to serve some function – is fundamental to human intelligence. Here we introduce a suite of challenging physical construction tasks inspired by how children play with blocks, such as matching a target configuration, stacking and attaching blocks to connect objects together, and creating shelter-like structures over target objects. We then examine how a range of modern deep reinforcement learning agents fare on these challenges, and introduce several new approaches which provide superior performance. Our results show that agents which use structured representations (e.g., objects and scene graphs) and structured policies (e.g., object-centric actions) outperform those which use less structured representations, and generalize better beyond their training when asked to reason about larger scenes. Agents which use model-based planning via Monte-Carlo Tree Search also outperform strictly model-free agents in our most challenging construction problems. We conclude that approaches which combine structured representations and reasoning with powerful learning are a key path toward agents that possess rich intuitive physics, scene understanding, and planning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03177

PDF

https://arxiv.org/pdf/1904.03177
Read All
Moving Object Detection under Discontinuous Change in Illumination Using Tensor Low-Rank and Invariant Sparse Decomposition

2019-04-05

Moein Shakeri, Hong Zhang

arXiv_CV

arXiv_CV Regularization Object_Detection Sparse Detection
Abstract

Although low-rank and sparse decomposition based methods have been successfully applied to the problem of moving object detection using structured sparsity-inducing norms, they are still vulnerable to significant illumination changes that arise in certain applications. We are interested in moving object detection in applications involving time-lapse image sequences for which current methods mistakenly group moving objects and illumination changes into foreground. Our method relies on the multilinear (tensor) data low-rank and sparse decomposition framework to address the weaknesses of existing methods. The key to our proposed method is to create first a set of prior maps that can characterize the changes in the image sequence due to illumination. We show that they can be detected by a k-support norm. To deal with concurrent, two types of changes, we employ two regularization terms, one for detecting moving objects and the other for accounting for illumination changes, in the tensor low-rank and sparse decomposition formulation. Through comprehensive experiments using challenging datasets, we show that our method demonstrates a remarkable ability to detect moving objects under discontinuous change in illumination, and outperforms the state-of-the-art solutions to this challenging problem.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03175

PDF

https://arxiv.org/pdf/1904.03175
Read All
Improving Scientific Article Visibility by Neural Title Simplification

2019-04-05

Alexander Shvets

arXiv_CL

arXiv_CL Attention Recommendation
Abstract

The rapidly growing amount of data that scientific content providers should deliver to a user makes them create effective recommendation tools. A title of an article is often the only shown element to attract people’s attention. We offer an approach to automatic generating titles with various levels of informativeness to benefit from different categories of users. Statistics from ResearchGate used to bias train datasets and specially designed post-processing step applied to neural sequence-to-sequence models allow reaching the desired variety of simplified titles to gain a trade-off between the attractiveness and transparency of recommendation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03172

PDF

https://arxiv.org/pdf/1904.03172
Read All
Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?

2019-04-05

Rameen Abdal, Yipeng Qin, Peter Wonka

arXiv_CV

arXiv_CV GAN Style_Transfer Embedding
Abstract

We propose an efficient algorithm to embed a given image into the latent space of StyleGAN. This embedding enables semantic image editing operations that can be applied to existing photographs. Taking the StyleGAN trained on the FFHD dataset as an example, we show results for image morphing, style transfer, and expression transfer. Studying the results of the embedding algorithm provides valuable insights into the structure of the StyleGAN latent space. We propose a set of experiments to test what class of images can be embedded, how they are embedded, what latent space is suitable for embedding, and if the embedding is semantically meaningful.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03189

PDF

http://arxiv.org/pdf/1904.03189
Read All
Disentangling Language and Knowledge in Task-Oriented Dialogs

2019-04-05

Dinesh Raghu, Nikhil Gupta, Mausam

arXiv_CL

arXiv_CL Knowledge Language_Model
Abstract

The Knowledge Base (KB) used for real-world applications, such as booking a movie or restaurant reservation, keeps changing over time. End-to-end neural networks trained for these task-oriented dialogs are expected to be immune to any changes in the KB. However, existing approaches breakdown when asked to handle such changes. We propose an encoder-decoder architecture (BoSsNet) with a novel Bag-of-Sequences (BoSs) memory, which facilitates the disentangled learning of the response’s language model and its knowledge incorporation. Consequently, the KB can be modified with new knowledge without a drop in interpretability. We find that BoSsNet outperforms state-of-the-art models, with considerable improvements (> 10\%) on bAbI OOV test sets and other human-human datasets. We also systematically modify existing datasets to measure disentanglement and show BoSsNet to be robust to KB modifications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.01216

PDF

http://arxiv.org/pdf/1805.01216
Read All
HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects

2019-04-05

Roman Kaskman, Sergey Zakharov, Ivan Shugurov, Slobodan Ilic

arXiv_CV

arXiv_CV Object_Detection Pose_Estimation Deep_Learning Detection
Abstract

One of the most important prerequisites for creating and evaluating 6D object pose detectors are datasets with labeled 6D poses. In the advent of deep learning methods, demand for such datasets is consinuously arising. Despite the fact that some of those exist, they are scarce and typically have restricted setups, e.g. a single object per sequence, or focus on specific object types, such as textureless industrial parts. Besides, two significant components are often ignored: training only from available 3D models instead of real data and scalability, i.e. training one method to detect all objects rather than training one detector per object. Other challenges, such as occlusions, changing light conditions and object appearance changes, as well as precisely defined benchmarks are either not present or scattered among different datasets. In this paper we present dataset for 6D pose estimation that covers the above-mentioned challenges, mainly targeting training from 3D models (both textured and textureless), scalability, occlusions, light and object appearance changes. The dataset features 33 objects (17 toy, 8 household and 8 industry-relevant objects) over 13 scenes of various difficulty. Moreover, we present a set benchmarks with the purpose of testing various desired properties of the detectors, particularly focusing on scalability with respect to the number of objects, resistance to changing light conditions, occlusions and clutter. We also set a baseline for the presented benchmarks using a publicly available state of the art detector. Considering difficulties in making such datasets, we plan to release the code allowing other researchers to extend this dataset or make their own datasets in the future.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03167

PDF

https://arxiv.org/pdf/1904.03167
Read All
Exploring Fine-Tuned Embeddings that Model Intensifiers for Emotion Analysis

2019-04-05

Laura Bostan, Roman Klinger

arXiv_CL

arXiv_CL Sentiment Embedding Classification Language_Model Prediction
Abstract

Adjective phrases like “a little bit surprised”, “completely shocked”, or “not stunned at all” are not handled properly by currently published state-of-the-art emotion classification and intensity prediction systems which use pre-dominantly non-contextualized word embeddings as input. Based on this finding, we analyze differences between embeddings used by these systems in regard to their capability of handling such cases. Furthermore, we argue that intensifiers in context of emotion words need special treatment, as is established for sentiment polarity classification, but not for more fine-grained emotion prediction. To resolve this issue, we analyze different aspects of a post-processing pipeline which enriches the word representations of such phrases. This includes expansion of semantic spaces at the phrase level and sub-word level followed by retrofitting to emotion lexica. We evaluate the impact of these steps with A La Carte and Bag-of-Substrings extensions based on pretrained GloVe, Word2vec, and fastText embeddings against a crowd-sourced corpus of intensity annotations for tweets containing our focus phrases. We show that the fastText-based models do not gain from handling these specific phrases under inspection. For Word2vec embeddings, we show that our post-processing pipeline improves the results by up to 8% on a novel dataset densely populated with intensifiers.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03164

PDF

https://arxiv.org/pdf/1904.03164
Read All
Evaluating Text-to-Image Matching using Binary Image Selection

2019-04-05

Hexiang Hu, Ishan Misra, Laurens van der Maaten

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval Caption
Abstract

Providing systems the ability to relate linguistic and visual content is one of the hallmarks of computer vision. Tasks such as text-based image retrieval and image captioning were designed to test this ability but come with evaluation measures that have a high variance or are difficult to interpret. We study an alternative task for systems that match text and images: given a text query, the system is asked to select the image that best matches the query from a pair of semantically similar images. The system’s accuracy on this Binary Image SelectiON (BISON) task is interpretable, eliminates the reliability problems of retrieval evaluations, and focuses on the system’s ability to understand fine-grained visual structure. We gather a BISON dataset that complements the COCO dataset and use it to evaluate modern text-based image retrieval and image captioning systems. Our results provide novel insights into the performance of these systems. The COCO-BISON dataset and corresponding evaluation code are publicly available from \url{this http URL}.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.06595

PDF

https://arxiv.org/pdf/1901.06595
Read All
Unsupervised Image Matching and Object Discovery as Optimization

2019-04-05

Huy V. Vo, Francis Bach, Minsu Cho, Kai Han, Yann LeCun, Patrick Perez, Jean Ponce

arXiv_CV

arXiv_CV Optimization
Abstract

Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an important field of research. In computer vision, unsupervised learning comes in various guises. We focus here on the unsupervised discovery and matching of object categories among images in a collection, following the work of Cho et al. 2015. We show that the original approach can be reformulated and solved as a proper optimization problem. Experiments on several benchmarks establish the merit of our approach.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03148

PDF

https://arxiv.org/pdf/1904.03148
Read All
Jointly Extracting and Compressing Documents with Summary State Representations

2019-04-05

Afonso Mendes, Shashi Narayan, Sebastião Miranda, Zita Marinho, André F. T. Martins, Shay B. Cohen

arXiv_CL

arXiv_CL Summarization
Abstract

We present a new neural model for text summarization that first extracts sentences from a document and then compresses them. The proposed model offers a balance that sidesteps the difficulties in abstractive methods while generating more concise summaries than extractive methods. In addition, our model dynamically determines the length of the output summary based on the gold summaries it observes during training and does not require length constraints typical to extractive summarization. The model achieves state-of-the-art results on the CNN/DailyMail and Newsroom datasets, improving over current extractive and abstractive methods. Human evaluations demonstrate that our model generates concise and informative summaries. We also make available a new dataset of oracle compressive summaries derived automatically from the CNN/DailyMail reference summaries.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02020

PDF

https://arxiv.org/pdf/1904.02020
Read All

88/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL