Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Large Scale GAN Training for High Fidelity Natural Image Synthesis

2019-02-25

Andrew Brock, Jeff Donahue, Karen Simonyan

arXiv_CV

arXiv_CV Regularization Adversarial GAN
Abstract

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple “truncation trick,” allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator’s input. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128x128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous best IS of 52.52 and FID of 18.6.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1809.11096

PDF

https://arxiv.org/pdf/1809.11096
Read All
Learning Extreme Hummingbird Maneuvers on Flapping Wing Robots

2019-02-25

Fan Fei, Zhan Tu, Jian Zhang, Xinyan Deng

arXiv_RO

arXiv_RO Reinforcement_Learning
Abstract

Biological studies show that hummingbirds can perform extreme aerobatic maneuvers during fast escape. Given a sudden looming visual stimulus at hover, a hummingbird initiates a fast backward translation coupled with a 180-degree yaw turn, which is followed by instant posture stabilization in just under 10 wingbeats. Consider the wingbeat frequency of 40Hz, this aggressive maneuver is carried out in just 0.2 seconds. Inspired by the hummingbirds’ near-maximal performance during such extreme maneuvers, we developed a flight control strategy and experimentally demonstrated that such maneuverability can be achieved by an at-scale 12-gram hummingbird robot equipped with just two actuators. The proposed hybrid control policy combines model-based nonlinear control with model-free reinforcement learning. We use model-based nonlinear control for nominal flight control, as the dynamic model is relatively accurate for these conditions. However, during extreme maneuver, the modeling error becomes unmanageable. A model-free reinforcement learning policy trained in simulation was optimized to ‘destabilize’ the system and maximize the performance during maneuvering. The hybrid policy manifests a maneuver that is close to that observed in hummingbirds. Direct simulation-to-real transfer is achieved, demonstrating the hummingbird-like fast evasive maneuvers on the at-scale hummingbird robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09626

PDF

http://arxiv.org/pdf/1902.09626
Read All
Deep Learning for Low-Dose CT Denoising

2019-02-25

Maryam Gholizadeh-Ansari, Javad Alirezaie, Paul Babyn

arXiv_CV

arXiv_CV Deep_Learning Detection
Abstract

Low-dose CT denoising is a challenging task that has been studied by many researchers. Some studies have used deep neural networks to improve the quality of low-dose CT images and achieved fruitful results. In this paper, we propose a deep neural network that uses dilated convolutions with different dilation rates instead of standard convolution helping to capture more contextual information in fewer layers. Also, we have employed residual learning by creating shortcut connections to transmit image information from the early layers to later ones. To further improve the performance of the network, we have introduced a non-trainable edge detection layer that extracts edges in horizontal, vertical, and diagonal directions. Finally, we demonstrate that optimizing the network by a combination of mean-square error loss and perceptual loss preserves many structural details in the CT image. This objective function does not suffer from over smoothing and blurring effects caused by per-pixel loss and grid-like artifacts resulting from perceptual loss. The experiments show that each modification to the network improves the outcome while only minimally changing the complexity of the network.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10127

PDF

http://arxiv.org/pdf/1902.10127
Read All
Convolutional Neural Networks for Automatic Meter Reading

2019-02-25

Rayson Laroca, Victor Barroso, Matheus A. Diniz, Gabriel R. Gonçalves, William Robson Schwartz, David Menotti

arXiv_CV

arXiv_CV Object_Detection Knowledge CNN Detection Recognition
Abstract

In this paper, we tackle Automatic Meter Reading (AMR) by leveraging the high capability of Convolutional Neural Networks (CNNs). We design a two-stage approach that employs the Fast-YOLO object detector for counter detection and evaluates three different CNN-based approaches for counter recognition. In the AMR literature, most datasets are not available to the research community since the images belong to a service company. In this sense, we introduce a new public dataset, called UFPR-AMR dataset, with 2,000 fully and manually annotated images. This dataset is, to the best of our knowledge, three times larger than the largest public dataset found in the literature and contains a well-defined evaluation protocol to assist the development and evaluation of AMR methods. Furthermore, we propose the use of a data augmentation technique to generate a balanced training set with many more examples to train the CNN models for counter recognition. In the proposed dataset, impressive results were obtained and a detailed speed/accuracy trade-off evaluation of each model was performed. In a public dataset, state-of-the-art results were achieved using less than 200 images for training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09600

PDF

http://arxiv.org/pdf/1902.09600
Read All
Unsupervised learning-based long-term superpixel tracking

2019-02-25

Pierre-Henri Conze, Florian Tilquin, Mathieu Lamard, Fabrice Heitz, Gwenolé Quellec

arXiv_CV

arXiv_CV Tracking Object_Tracking
Abstract

Finding correspondences between structural entities decomposing images is of high interest for computer vision applications. In particular, we analyze how to accurately track superpixels - visual primitives generated by aggregating adjacent pixels sharing similar characteristics - over extended time periods relying on unsupervised learning and temporal integration. A two-step video processing pipeline dedicated to long-term superpixel tracking is proposed. First, unsupervised learning-based superpixel matching provides correspondences between consecutive and distant frames using new context-rich features extended from greyscale to multi-channel and forward-backward consistency contraints. Resulting elementary matches are then combined along multi-step paths running through the whole sequence with various inter-frame distances. This produces a large set of candidate long-term superpixel pairings upon which majority voting is performed. Video object tracking experiments demonstrate the accuracy of our elementary estimator against state-of-the-art methods and proves the ability of multi-step integration to provide accurate long-term superpixel matches compared to usual direct and sequential integration.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09596

PDF

http://arxiv.org/pdf/1902.09596
Read All
A detailed comparative study of open source deep learning frameworks

2019-02-25

Ghadeer Al-Bdour, Raffi Al-Qurran, Mahmoud Al-Ayyoub, Ali Shatnawi

arXiv_CV

arXiv_CV Deep_Learning Quantitative
Abstract

Deep Learning (DL) is one of the hottest trends in machine learning as DL approaches produced results superior to the state-of-the-art in problematic areas such as image processing and natural language processing (NLP). To foster the growth of DL, several open source frameworks appeared providing implementations of the most common DL algorithms. These frameworks vary in the algorithms they support and in the quality of their implementations. The purpose of this work is to provide a qualitative and quantitative comparison among three of the most popular and most comprehensive DL frameworks (namely Google’s TensorFlow, University of Montreal’s Theano and Microsoft’s CNTK). The ultimate goal of this work is to help end users make an informed decision about the best DL framework that suits their needs and resources. To ensure that our study is as comprehensive as possible, we conduct several experiments using multiple benchmark datasets from different fields (image processing, NLP, etc.) and measure the performance of the frameworks’ implementations of different DL algorithms. For most of our experiments, we find out that CNTK’s implementations are superior to the other ones under consideration.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00102

PDF

http://arxiv.org/pdf/1903.00102
Read All
Deep Echo State Network : A Brief Survey

2019-02-25

Claudio Gallicchio, Alessio Micheli

arXiv_AI

arXiv_AI Attention Survey RNN
Abstract

The study of deep recurrent neural networks (RNNs) and, in particular, of deep Reservoir Computing (RC) is gaining an increasing research attention in the neural networks community. The recently introduced Deep Echo State Network (DeepESN) model opened the way to an extremely efficient approach for designing deep neural networks for temporal data. At the same time, the study of DeepESNs allowed to shed light on the intrinsic properties of state dynamics developed by hierarchical compositions of recurrent layers, i.e. on the bias of depth in RNNs architectural design. In this paper, we summarize the advancements in the development, analysis and applications of DeepESNs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1712.04323

PDF

http://arxiv.org/pdf/1712.04323
Read All
BUT-FIT at SemEval-2019 Task 7: Determining the Rumour Stance with Pre-Trained Deep Bidirectional Transformers

2019-02-25

Martin Fajcik, Lukáš Burget, Pavel Smrz

arXiv_AI

arXiv_AI Classification
Abstract

This paper describes our system submitted to SemEval 2019 Task 7: RumourEval 2019: Determining Rumour Veracity and Support for Rumours, Subtask A (Gorrell et al., 2019). The challenge focused on classifying whether posts from Twitter and Reddit support, deny, query, or comment a hidden rumour, truthfulness of which is the topic of an underlying discussion thread. We formulate the problem as a stance classification, determining the rumour stance of a post with respect to the previous thread post and the source thread post. The recent BERT architecture was employed to build an end-to-end system which has reached the F1 score of 61.67% on the provided test data. It finished at the 2nd place in the competition, without any hand-crafted features, only 0.2% behind the winner.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10126

PDF

http://arxiv.org/pdf/1902.10126
Read All
An Access Control Model for Robot Calibration

2019-02-25

Ryan Shah, Shishir Nagaraja

arXiv_RO

arXiv_RO Adversarial
Abstract

High assurance surgical robotic systems require robustness to both safety issues and security issues (i.e adversarial interference). In this work, we argue that safety and security are not disjoint properties, but that security is a safety requirement. Surgical robotics presents new information flow requirements that includes multiple levels of confidentiality and integrity, as well as the need for compartmentation arising from conflicts of interest. We develop an information flow model that derives from lattice-based access control. This model addresses the flow constraints of the calibration lifecycle of surgical robots - an important aspect of a high-assurance environment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09587

PDF

http://arxiv.org/pdf/1902.09587
Read All
Challenges for an Ontology of Artificial Intelligence

2019-02-25

Scott H. Hawley

arXiv_AI

arXiv_AI Ontology
Abstract

Of primary importance in formulating a response to the increasing prevalence and power of artificial intelligence (AI) applications in society are questions of ontology. Questions such as: What “are” these systems? How are they to be regarded? How does an algorithm come to be regarded as an agent? We discuss three factors which hinder discussion and obscure attempts to form a clear ontology of AI: (1) the various and evolving definitions of AI, (2) the tendency for pre-existing technologies to be assimilated and regarded as “normal,” and (3) the tendency of human beings to anthropomorphize. This list is not intended as exhaustive, nor is it seen to preclude entirely a clear ontology, however, these challenges are a necessary set of topics for consideration. Each of these factors is seen to present a ‘moving target’ for discussion, which poses a challenge for both technical specialists and non-practitioners of AI systems development (e.g., philosophers and theologians) to speak meaningfully given that the corpus of AI structures and capabilities evolves at a rapid pace. Finally, we present avenues for moving forward, including opportunities for collaborative synthesis for scholars in philosophy and science.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03171

PDF

http://arxiv.org/pdf/1903.03171
Read All
Quantifying error contributions of computational steps, algorithms and hyperparameter choices in image classification pipelines

2019-02-25

Aritra Chowdhury, Malik Magdin-Ismail, Bulent Yener

arXiv_CV

arXiv_CV GAN Image_Classification Optimization Classification
Abstract

Data science relies on pipelines that are organized in the form of interdependent computational steps. Each step consists of various candidate algorithms that maybe used for performing a particular function. Each algorithm consists of several hyperparameters. Algorithms and hyperparameters must be optimized as a whole to produce the best performance. Typical machine learning pipelines typically consist of complex algorithms in each of the steps. Not only is the selection process combinatorial, but it is also important to interpret and understand the pipelines. We propose a method to quantify the importance of different layers in the pipeline, by computing an error contribution relative to an agnostic choice of algorithms in that layer. We demonstrate our methodology on image classification pipelines. The agnostic methodology quantifies the error contributions from the computational steps, algorithms and hyperparameters in the image classification pipeline. We show that algorithm selection and hyper-parameter optimization methods can be used to quantify the error contribution and that random search is able to quantify the contribution more accurately than Bayesian optimization. This methodology can be used by domain experts to understand machine learning and data analysis pipelines in terms of their individual components, which can help in prioritizing different components of the pipeline.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02521

PDF

http://arxiv.org/pdf/1903.02521
Read All
Condition-Invariant Multi-View Place Recognition

2019-02-25

Jose M. Facil, Daniel Olid, Luis Montesano, Javier Civera

arXiv_CV

arXiv_CV Recognition
Abstract

Visual place recognition is particularly challenging when places suffer changes in its appearance. Such changes are indeed common, e.g., due to weather, night/day or seasons. In this paper we leverage on recent research using deep networks, and explore how they can be improved by exploiting the temporal sequence information. Specifically, we propose 3 different alternatives (Descriptor Grouping, Fusion and Recurrent Descriptors) for deep networks to use several frames of a sequence. We show that our approaches produce more compact and best performing descriptors than single- and multi-view baselines in the literature in two public databases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09516

PDF

http://arxiv.org/pdf/1902.09516
Read All
Lost in Machine Translation: A Method to Reduce Meaning Loss

2019-02-25

Reuben Cohn-Gordon, Noah Goodman

arXiv_CL

arXiv_CL
Abstract

A desideratum of high-quality translation systems is that they preserve meaning, in the sense that two sentences with different meanings should not translate to one and the same sentence in another language. However, state-of-the-art systems often fail in this regard, particularly in cases where the source and target languages partition the “meaning space” in different ways. For instance, “I cut my finger.” and “I cut my finger off.” describe different states of the world but are translated to French (by both Fairseq and Google Translate) as “Je me suis coupe le doigt.”, which is ambiguous as to whether the finger is detached. More generally, translation systems are typically many-to-one (non-injective) functions from source to target language, which in many cases results in important distinctions in meaning being lost in translation. Building on Bayesian models of informative utterance production, we present a method to define a less ambiguous translation system in terms of an underlying pre-trained neural sequence-to-sequence model. This method increases injectivity, resulting in greater preservation of meaning as measured by improvement in cycle-consistency, without impeding translation quality (measured by BLEU score).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09514

PDF

http://arxiv.org/pdf/1902.09514
Read All
FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation

2019-02-25

Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh Chen

arXiv_CV

arXiv_CV Segmentation Embedding CNN
Abstract

Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use. In this work, we propose FEELVOS as a simple and fast method which does not rely on fine-tuning. In order to segment a video, for each frame FEELVOS uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame. In contrast to previous work, our embedding is only used as an internal guidance of a convolutional network. Our novel dynamic segmentation head allows us to train the network, including the embedding, end-to-end for the multiple object segmentation task with a cross entropy loss. We achieve a new state of the art in video object segmentation without fine-tuning on the DAVIS 2017 validation set with a J&F measure of 69.1%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09513

PDF

http://arxiv.org/pdf/1902.09513
Read All
Using Deep Object Features for Image Descriptions

2019-02-25

Ashutosh Mishra, Marcus Liwicki

arXiv_CV

arXiv_CV Image_Caption Caption Embedding Language_Model
Abstract

Inspired by recent advances in leveraging multiple modalities in machine translation, we introduce an encoder-decoder pipeline that uses (1) specific objects within an image and their object labels, (2) a language model for decoding joint embedding of object features and the object labels. Our pipeline merges prior detected objects from the image and their object labels and then learns the sequences of captions describing the particular image. The decoder model learns to extract descriptions for the image from scratch by decoding the joint representation of the object visual features and their object classes conditioned by the encoder component. The idea of the model is to concentrate only on the specific objects of the image and their labels for generating descriptions of the image rather than visual feature of the entire image. The model needs to be calibrated more by adjusting the parameters and settings to result in better accuracy and performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09969

PDF

http://arxiv.org/pdf/1902.09969
Read All
Improving Robustness of Machine Translation with Synthetic Noise

2019-02-25

Vaibhav, Sumeet Singh, Craig Stewart, Graham Neubig

arXiv_CL

arXiv_CL
Abstract

Modern Machine Translation (MT) systems perform consistently well on clean, in-domain text. However human generated text, particularly in the realm of social media, is full of typos, slang, dialect, idiolect and other noise which can have a disastrous impact on the accuracy of output translation. In this paper we leverage the Machine Translation of Noisy Text (MTNT) dataset to enhance the robustness of MT systems by emulating naturally occurring noise in otherwise clean data. Synthesizing noise in this manner we are ultimately able to make a vanilla MT system resilient to naturally occurring noise and partially mitigate loss in accuracy resulting therefrom.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09508

PDF

http://arxiv.org/pdf/1902.09508
Read All
GQA: a new dataset for compositional question answering over real-world images

2019-02-25

Drew A. Hudson, Christopher D. Manning

arXiv_AI

arXiv_AI QA RNN VQA
Abstract

We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages scene graph structures to create 22M diverse reasoning questions, all come with functional programs that represent their semantics. We use the programs to gain tight control over the answer distribution and present a new tunable smoothing technique to mitigate language biases. Accompanying the dataset is a suite of new metrics that evaluate essential qualities such as consistency, grounding and plausibility. An extensive analysis is performed for baselines as well as state-of-the-art models, providing fine-grained results for different question types and topologies. Whereas a blind LSTM obtains mere 42.1%, and strong VQA models achieve 54.1%, human performance tops at 89.3\%, offering ample opportunity for new research to explore. We strongly hope GQA will provide an enabling resource for the next generation of models with enhanced robustness, improved consistency, and deeper semantic understanding for images and language.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09506

PDF

http://arxiv.org/pdf/1902.09506
Read All
Unsupervised Learning of Dense Optical Flow, Depth and Egomotion from Sparse Event Data

2019-02-25

Chengxi Ye, Anton Mitrokhin, Cornelia Fermüller, James A. Yorke, Yiannis Aloimonos

arXiv_CV

arXiv_CV Sparse Inference Deep_Learning
Abstract

In this work we present a lightweight, unsupervised learning pipeline for \textit{dense} depth, optical flow and egomotion estimation from sparse event output of the Dynamic Vision Sensor (DVS). To tackle this low level vision task, we use a novel encoder-decoder neural network architecture - ECN. Our work is the first monocular pipeline that generates dense depth and optical flow from sparse event data only. The network works in self-supervised mode and has just 150k parameters. We evaluate our pipeline on the MVSEC self driving dataset and present results for depth, optical flow and and egomotion estimation. Due to the lightweight design, the inference part of the network runs at 250 FPS on a single GPU, making the pipeline ready for realtime robotics applications. Our experiments demonstrate significant improvements upon previous works that used deep learning on event data, as well as the ability of our pipeline to perform well during both day and night.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08625

PDF

http://arxiv.org/pdf/1809.08625
Read All
Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing

2019-02-25

Tal Schuster, Ori Ram, Regina Barzilay, Amir Globerson

arXiv_CL

arXiv_CL Embedding
Abstract

We introduce a novel method for multilingual transfer that utilizes deep contextual embeddings, pretrained in an unsupervised fashion. While contextual embeddings have been shown to yield richer representations of meaning compared to their static counterparts, aligning them poses a challenge due to their dynamic nature. To this end, we construct context-independent variants of the original monolingual spaces and utilize their mapping to derive an alignment for the context-dependent spaces. This mapping readily supports processing of a target language, improving transfer by context-aware embeddings. Our experimental results demonstrate the effectiveness of this approach for zero-shot and few-shot learning of dependency parsing. Specifically, our method consistently outperforms the previous state-of-the-art on 6 target languages, yielding an improvement of 6.8 LAS points on average.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09492

PDF

http://arxiv.org/pdf/1902.09492
Read All
MUREL: Multimodal Relational Reasoning for Visual Question Answering

2019-02-25

Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome

arXiv_AI

arXiv_AI QA Attention Relation VQA
Abstract

Multimodal attentional networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images. Although attention allows to focus on the visual content relevant to the question, this simple mechanism is arguably insufficient to model complex reasoning features required for VQA or other high-level tasks. In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images. Our first contribution is the introduction of the MuRel cell, an atomic reasoning primitive representing interactions between question and image regions by a rich vectorial representation, and modeling region relations with pairwise combinations. Secondly, we incorporate the cell into a full MuRel network, which progressively refines visual and question interactions, and can be leveraged to define visualization schemes finer than mere attention maps. We validate the relevance of our approach with various ablation studies, and show its superiority to attention-based methods on three datasets: VQA 2.0, VQA-CP v2 and TDIUC. Our final MuRel network is competitive to or outperforms state-of-the-art results in this challenging context. Our code is available: https://github.com/Cadene/murel.bootstrap.pytorch

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09487

PDF

http://arxiv.org/pdf/1902.09487
Read All
MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts

2019-02-25

Sunil Mohan, Donghui Li

arXiv_CL

arXiv_CL Ontology Recognition
Abstract

This paper presents the formal release of MedMentions, a new manually annotated resource for the recognition of biomedical concepts. What distinguishes MedMentions from other annotated biomedical corpora is its size (over 4,000 abstracts and over 350,000 linked mentions), as well as the size of the concept ontology (over 3 million concepts from UMLS 2017) and its broad coverage of biomedical disciplines. In addition to the full corpus, a sub-corpus of MedMentions is also presented, comprising annotations for a subset of UMLS 2017 targeted towards document retrieval. To encourage research in Biomedical Named Entity Recognition and Linking, data splits for training and testing are included in the release, and a baseline model and its metrics for entity linking are also described.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09476

PDF

http://arxiv.org/pdf/1902.09476
Read All
Embedded Agency

2019-02-25

Abram Demski, Scott Garrabrant

arXiv_AI

arXiv_AI Survey Relation
Abstract

Traditional models of rational action treat the agent as though it is cleanly separated from its environment, and can act on that environment from the outside. Such agents have a known functional relationship with their environment, can model their environment in every detail, and do not need to reason about themselves or their internal parts. We provide an informal survey of obstacles to formalizing good reasoning for agents embedded in their environment. Such agents must optimize an environment that is not of type ``function’’; they must rely on models that fit within the modeled environment; and they must reason about themselves as just another physical system, made of parts that can be modified and that can work at cross purposes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09469

PDF

http://arxiv.org/pdf/1902.09469
Read All
Long-Range Indoor Navigation with PRM-RL

2019-02-25

Anthony Francis, Aleksandra Faust, Hao-Tien Lewis Chiang, Jasmine Hsu, J. Chase Kew, Marek Fiser, Tsang-Wei Edward Lee

arXiv_AI

arXiv_AI Face Reinforcement_Learning
Abstract

Long-range indoor navigation requires guiding robots with noisy sensors and controls through cluttered environments along paths that span a variety of buildings. We achieve this with PRM-RL, a hierarchical robot navigation method in which reinforcement learning agents that map noisy sensors to robot controls learn to solve short-range obstacle avoidance tasks, and then sampling-based planners map where these agents can reliably navigate in simulation; these roadmaps and agents are then deployed on-robot, guiding the robot along the shortest path where the agents are likely to succeed. Here we use Probabilistic Roadmaps (PRMs) as the sampling-based planner and AutoRL as the reinforcement learning method in the indoor navigation context. We evaluate the method in simulation for kinematic differential drive and kinodynamic car-like robots in several environments, and on-robot for differential-drive robots at two physical sites. Our results show PRM-RL with AutoRL is more successful than several baselines, is robust to noise, and can guide robots over hundreds of meters in the face of noise and obstacles in both simulation and on-robot, including over 3.3 kilometers of physical robot navigation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09458

PDF

http://arxiv.org/pdf/1902.09458
Read All
Explaining Image Classifiers by Counterfactual Generation

2019-02-25

Chun-Hao Chang, Elliot Creager, Anna Goldenberg, David Duvenaud

arXiv_CV

arXiv_CV Salient Prediction Relation
Abstract

When an image classifier makes a prediction, which parts of the image are relevant and why? We can rephrase this question to ask: which parts of the image, if they were not seen by the classifier, would most change its decision? Producing an answer requires marginalizing over images that could have been seen but weren’t. We can sample plausible image in-fills by conditioning a generative model on the rest of the image. We then optimize to find the image regions that most change the classifier’s decision after in-fill. Our approach contrasts with ad-hoc in-filling approaches, such as blurring or injecting noise, which generate inputs far from the data distribution, and ignore informative relationships between different parts of the image. Our method produces more compact and relevant saliency maps, with fewer artifacts compared to previous methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.08024

PDF

http://arxiv.org/pdf/1807.08024
Read All
Cooperative Learning of Disjoint Syntax and Semantics

2019-02-25

Serhii Havrylov, Germán Kruszewski, Armand Joulin

arXiv_AI

arXiv_AI Sentiment Attention Optimization Inference
Abstract

There has been considerable attention devoted to models that learn to jointly infer an expression’s syntactic structure and its semantics. Yet, \citet{NangiaB18} has recently shown that the current best systems fail to learn the correct parsing strategy on mathematical expressions generated from a simple context-free grammar. In this work, we present a recursive model inspired by \newcite{ChoiYL18} that reaches near perfect accuracy on this task. Our model is composed of two separated modules for syntax and semantics. They are cooperatively trained with standard continuous and discrete optimization schemes. Our model does not require any linguistic structure for supervision and its recursive nature allows for out-of-domain generalization with little loss in performance. Additionally, our approach performs competitively on several natural language tasks, such as Natural Language Inference or Sentiment Analysis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09393

PDF

http://arxiv.org/pdf/1902.09393
Read All
A Review on Automatic License Plate Recognition System

2019-02-25

Satadal Saha

arXiv_CV

arXiv_CV Review Segmentation Face Recognition
Abstract

Automatic License Plate Recognition (ALPR) is a challenging problem to the research community due to its potential applicability in the diverse geographical condition over the globe with varying license plate parameters. Any ALPR system includes three main modules, viz. localization of the license plate, segmentation of the characters therein and recognition of the segmented characters. In real life applications where the images are captured over days and nights in an outdoor environment with varying lighting and weather conditions, varying pollution level and wind turbulences, localization, segmentation and recognition become challenging tasks. The tasks become more complex if the license plate is not in conformity with the standards laid by corresponding Motor Vehicles Department in terms of various features, e.g. area and aspect ratio of the license plate, background color, foreground color, shape, number of lines, font face/ size of characters, spacing between characters etc. Besides, license plates are often dirty or broken or having scratches or bent or tilted at its position. All these add to the challenges in developing an effective ALPR system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09385

PDF

http://arxiv.org/pdf/1902.09385
Read All
Data augmentation using learned transforms for one-shot medical image segmentation

2019-02-25

Amy Zhao, Guha Balakrishnan, Frédo Durand, John V. Guttag, Adrian V. Dalca

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

Biomedical image segmentation is an important task in many medical applications. Segmentation methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling datasets of medical images requires significant expertise and time, and is infeasible at large scales. To tackle the lack of labeled data, researchers use techniques such as hand-engineered preprocessing steps, hand-tuned architectures, and data augmentation. However, these techniques involve costly engineering efforts, and are typically dataset-specific. We present an automated data augmentation method for medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans, focusing on the one-shot segmentation scenario – a practical challenge in many medical applications. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transforms from the images, and use the model along with the labeled example to synthesize additional labeled training examples for supervised segmentation. Each transform is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. Augmenting the training of a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at https://github.com/xamyzhao/brainstorm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09383

PDF

http://arxiv.org/pdf/1902.09383
Read All
Using logical form encodings for unsupervised linguistic transformation: Theory and applications

2019-02-25

Tommo Gröndahl, N. Asokan

arXiv_CL

arXiv_CL Text_Generation Style_Transfer
Abstract

We present a novel method to architect automatic linguistic transformations for a number of tasks, including controlled grammatical or lexical changes, style transfer, text generation, and machine translation. Our approach consists in creating an abstract representation of a sentence’s meaning and grammar, which we use as input to an encoder-decoder network trained to reproduce the original sentence. Manipulating the abstract representation allows the transformation of sentences according to user-provided parameters, both grammatically and lexically, in any combination. Additionally, the same architecture can be used for controlled text generation, and even unsupervised machine translation, where the network is used to translate between different languages using no parallel corpora outside of a lemma-level dictionary. This strategy holds the promise of enabling many tasks that were hitherto outside the scope of NLP techniques for want of sufficient training data. We provide empirical evidence for the effectiveness of our approach by reproducing and transforming English sentences, and evaluating the results both manually and automatically. A single unsupervised model is used for all tasks. We report BLEU scores between 55.29 and 81.82 for sentence reproduction as well as back-and-forth grammatical transformations between 14 class pairs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09381

PDF

http://arxiv.org/pdf/1902.09381
Read All
Dual Attention Networks for Visual Reference Resolution in Visual Dialog

2019-02-25

Gi-Cheon Kang, Jaeseo Lim, Byoung-Tak Zhang

arXiv_CV

arXiv_CV QA Attention Quantitative Relation VQA
Abstract

Visual dialog (VisDial) is a task which requires an AI agent to answer a series of questions grounded in an image. Unlike in visual question answering (VQA), the series of questions should be able to capture a temporal context from a dialog history and exploit visually-grounded information. A problem called visual reference resolution involves these challenges, requiring the agent to resolve ambiguous references in a given question and find the references in a given image. In this paper, we propose Dual Attention Networks (DAN) for visual reference resolution. DAN consists of two kinds of attention networks, REFER and FIND. Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a self-attention mechanism. FIND module takes image features and reference-aware representations (i.e., the output of REFER module) as input, and performs visual grounding via bottom-up attention mechanism. We qualitatively and quantitatively evaluate our model on VisDial v1.0 and v0.9 datasets, showing that DAN outperforms the previous state-of-the-art model by a significant margin (2.0% on NDCG).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09368

PDF

http://arxiv.org/pdf/1902.09368
Read All
Anytime Heuristic for Weighted Matching Through Altruism-Inspired Behavior

2019-02-25

Panayiotis Danassis, Aris Filos-Ratsikas, Boi Faltings

arXiv_AI

arXiv_AI
Abstract

We present a novel anytime heuristic (ALMA), inspired by the human principle of altruism, for solving the assignment problem. ALMA is decentralized, completely uncoupled, and requires no communication between the participants. We prove an upper bound on the convergence speed that is polynomial in the desired number of resources and competing agents per resource; crucially, in the realistic case where the aforementioned quantities are bounded independently of the total number of agents/resources, the convergence time remains constant as the total problem size increases. We have evaluated ALMA under three test cases: (i) an anti-coordination scenario where agents with similar preferences compete over the same set of actions, (ii) a resource allocation scenario in an urban environment, under a constant-time constraint, and finally, (iii) an on-line matching scenario using real passenger-taxi data. In all of the cases, ALMA was able to reach high social welfare, while being orders of magnitude faster than the centralized, optimal algorithm. The latter allows our algorithm to scale to realistic scenarios with hundreds of thousands of agents, e.g., vehicle coordination in urban environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09359

PDF

http://arxiv.org/pdf/1902.09359
Read All
Food Image Recognition by Using Convolutional Neural Networks

2019-02-25

Yuzhen Lu

arXiv_CV

arXiv_CV CNN Image_Classification Classification Recognition
Abstract

Food image recognition is one of the promising applications of visual object recognition in computer vision. In this study, a small-scale dataset consisting of 5822 images of ten categories and a five-layer CNN was constructed to recognize these images. The bag-of-features (BoF) model coupled with support vector machine (SVM) was first evaluated for image classification, resulting in an overall accuracy of 56%; while the CNN model performed much better with an overall accuracy of 74%. Data augmentation techniques based on geometric transformation were applied to increase the size of training images, which achieved a significantly improved accuracy of more than 90% while preventing the overfitting issue that occurred to the CNN based on raw training data. Further improvements can be expected by collecting more images and optimizing the network architecture and hyper-parameters.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1612.00983

PDF

http://arxiv.org/pdf/1612.00983
Read All
Liability, Ethics, and Culture-Aware Behavior Specification using Rulebooks

2019-02-25

Andrea Censi, Konstantin Slutsky, Tichakorn Wongpiromsarn, Dmitry Yershov, Scott Pendleton, James Fu, Emilio Frazzoli

arXiv_AI

arXiv_AI
Abstract

The behavior of self-driving cars must be compatible with an enormous set of conflicting and ambiguous objectives, from law, from ethics, from the local culture, and so on. This paper describes a new way to conveniently define the desired behavior for autonomous agents, which we use on the self-driving cars developed at nuTonomy. We define a “rulebook” as a pre-ordered set of “rules”, each akin to a violation metric on the possible outcomes (“realizations”). The rules are partially ordered by priority. The semantics of a rulebook imposes a pre-order on the set of realizations. We study the compositional properties of the rulebooks, and we derive which operations we can allow on the rulebooks to preserve previously-introduced constraints. While we demonstrate the application of these techniques in the self-driving domain, the methods are domain-independent.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09355

PDF

http://arxiv.org/pdf/1902.09355
Read All
Similarity Measures based on Local Game Trees

2019-02-25

Sabrina Evans, Paolo Turrini

arXiv_AI

arXiv_AI
Abstract

We study strategic similarity of game positions in two-player extensive games of perfect information, by looking at the structure of their local game trees, with the aim of improving the performance of game playing agents in detecting forcing continuations. We present a range of measures over the induced game trees and compare them against benchmark problems in chess, observing a promising level of accuracy in matching up trap states.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09335

PDF

http://arxiv.org/pdf/1902.09335
Read All
Vision-based Control of a Quadrotor in User Proximity: Mediated vs End-to-End Learning Approaches

2019-02-25

Dario Mantegazza, Jérôme Guzzi, Luca M. Gambardella, Alessandro Giusti

arXiv_CV

arXiv_CV
Abstract

We consider the task of controlling a quadrotor to hover in front of a freely moving user, using input data from an onboard camera. On this specific task we compare two widespread learning paradigms: a mediated approach, which learns an high-level state from the input and then uses it for deriving control signals; and an end-to-end approach, which skips high-level state estimation altogether. We show that despite their fundamental difference, both approaches yield equivalent performance on this task. We finally qualitatively analyze the behavior of a quadrotor implementing such approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08881

PDF

http://arxiv.org/pdf/1809.08881
Read All
Making History Matter: Gold-Critic Sequence Training for Visual Dialog

2019-02-25

Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang

arXiv_CV

arXiv_CV Attention Reinforcement_Learning
Abstract

We study the multi-round response generation in visual dialog systems, where a response is generated according to a visually grounded conversational history. Given a triplet: an image, Q&A history, and current question, all the prevailing methods follow a codec (ie, encoder-decoder) fashion in the supervised learning paradigm: a multimodal encoder encodes the triplet into a feature vector, which is then fed into the decoder for the current answer generation, supervised by the ground-truth answer. However, this conventional supervised learning does not take into account the impact of imperfect history in the codec training, violating the conversational nature of visual dialog and thus making the codec more inclined to learn dataset bias but not visual reasoning. To this end, inspired by the actor-critic policy gradient in reinforcement learning, we propose a novel training paradigm called Gold-Critic Sequence Training (GCST). Specifically, we intentionally impose wrong answers in the history, obtaining an adverse reward, and see how the historic error impacts the codec’s future behavior by subtracting the gold-critic baseline — reward obtained by using ground-truth history — from the adverse reward. Moreover, to make the codec more sensitive to the history, we propose a novel attention network called Recurrent Co-Attention Network (RCAN) which can be effectively trained by using GCST. Experimental results on three benchmarks: VisDial0.9&1.0 and GuessWhat?!, show that the proposed GCST strategy consistently outperforms over state-of-the-art supervised counterparts under all metrics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09326

PDF

http://arxiv.org/pdf/1902.09326
Read All
Attentional Encoder Network for Targeted Sentiment Classification

2019-02-25

Youwei Song, Jiahai Wang, Tao Jiang, Zhiyue Liu, Yanghui Rao

arXiv_CL

arXiv_CL Regularization Sentiment Attention Sentiment_Classification Embedding RNN Classification
Abstract

Targeted sentiment classification aims at determining the sentimental tendency towards specific targets. Most of the previous approaches model context and target words using recurrent neural networks such as LSTM in conjunction with attention mechanisms. However, LSTM networks are difficult to parallelize because of their sequential nature. Moreover, since full backpropagation over the sequence requires large amounts of memory, essentially every implementation of backpropagation through time is the truncated version, which brings difficulty in remembering long-term patterns. To address these issues, this paper propose an Attentional Encoder Network (AEN) for targeted sentiment classification. Contrary to previous LSTM based works, AEN eschews complex recurrent neural networks and employs attention based encoders for the modeling between context and target, which can excavate the rich introspective and interactive semantic information from the word embeddings without considering the distance between words. This paper also raise the label unreliability issue and introduce label smoothing regularization term to the loss function for encouraging the model to be less confident with the training labels. Experimental results on three benchmark datasets demonstrate that our model achieves comparable or superior performances with a lightweight model size.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09314

PDF

http://arxiv.org/pdf/1902.09314
Read All
End-to-end Hand Mesh Recovery from a Monocular RGB Image

2019-02-25

Xiong Zhang, Qiang Li, Wenbo Zhang, Wen Zheng

arXiv_CV

arXiv_CV Image_Caption Pose_Estimation
Abstract

In this paper, we present a HAnd Mesh Recovery (HAMR) framework to tackle the problem of reconstructing the full 3D mesh of a human hand from a single RGB image. In contrast to existing research on 2D or 3D hand pose estimation from RGB or/and depth image data, HAMR can provide a more expressive and useful mesh representation for monocular hand image understanding. In particular, the mesh representation is achieved by parameterizing a generic 3D hand model with shape and relative 3D joint angles. By utilizing this mesh representation, we can easily compute the 3D joint locations via linear interpolations between the vertexes of the mesh, while obtain the 2D joint locations with a projection of the 3D joints.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09305

PDF

http://arxiv.org/pdf/1902.09305
Read All
Tiling and Stitching Segmentation Output for Remote Sensing: Basic Challenges and Recommendations

2019-02-25

Bohao Huang, Daniel Reichman, Leslie M. Collins, Kyle Bradbury, Jordan M. Malof

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation Inference Recommendation
Abstract

In this work we consider the application of convolutional neural networks (CNNs) for pixel-wise labeling (a.k.a., semantic segmentation) of remote sensing imagery (e.g., aerial color or hyperspectral imagery). Remote sensing imagery is usually stored in the form of very large images, referred to as “tiles”, which are too large to be segmented directly using most CNNs and their associated hardware. As a result, during label inference, smaller sub-images, called “patches”, are processed individually and then “stitched” (concatenated) back together to create a tile-sized label map. This approach suffers from computational ineffiency and can result in discontinuities at output boundaries. We propose a simple alternative approach in which the input size of the CNN is dramatically increased only during label inference. This does not avoid stitching altogether, but substantially mitigates its limitations. We evaluate the performance of the proposed approach against a vonventional stitching approach using two popular segmentation CNN models and two large-scale remote sensing imagery datasets. The results suggest that the proposed approach substantially reduces label inference time, while also yielding modest overall label accuracy increases. This approach contributed to our wining entry (overall performance) in the INRIA building labeling competition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.12219

PDF

http://arxiv.org/pdf/1805.12219
Read All
MIRA: A Computational Neuro-Based Cognitive Architecture Applied to Movie Recommender Systems

2019-02-25

Mariana B. Santos, Amanda M. Lima, Lucas A. Silva, Felipe S. Vargas, Guilherme A. Wachs-Lopes, Paulo S. Rodrigues

arXiv_AI

arXiv_AI Recommendation
Abstract

The human mind is still an unknown process of neuroscience in many aspects. Nevertheless, for decades the scientific community has proposed computational models that try to simulate their parts, specific applications, or their behavior in different situations. The most complete model in this line is undoubtedly the LIDA model, proposed by Stan Franklin with the aim of serving as a generic computational architecture for several applications. The present project is inspired by the LIDA model to apply it to the process of movie recommendation, the model called MIRA (Movie Intelligent Recommender Agent) presented percentages of precision similar to a traditional model when submitted to the same assay conditions. Moreover, the proposed model reinforced the precision indexes when submitted to tests with volunteers, proving once again its performance as a cognitive model, when executed with small data volumes. Considering that the proposed model achieved a similar behavior to the traditional models under conditions expected to be similar for natural systems, it can be said that MIRA reinforces the applicability of LIDA as a path to be followed for the study and generation of computational agents inspired by neural behaviors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09291

PDF

http://arxiv.org/pdf/1902.09291
Read All
Adversarial attacks hidden in plain sight

2019-02-25

Jan Philip Göpfert, Heiko Wersing, Barbara Hammer

arXiv_CV

arXiv_CV Adversarial CNN Classification
Abstract

Convolutional neural networks have been used to achieve a string of successes during recent years, but their lack of interpretability remains a serious issue. Adversarial examples are designed to deliberately fool neural networks into making any desired incorrect classification, potentially with very high certainty. We underline the severity of the issue by presenting a technique that allows to hide such adversarial attacks in regions of high complexity, such that they are imperceptible even to an astute observer.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09286

PDF

http://arxiv.org/pdf/1902.09286
Read All
Relation Extraction using Explicit Context Conditioning

2019-02-25

Gaurav Singh, Parminder Bhatia

arXiv_CL

arXiv_CL Relation_Extraction Relation
Abstract

Relation Extraction (RE) aims to label relations between groups of marked entities in raw text. Most current RE models learn context-aware representations of the target entities that are then used to establish relation between them. This works well for intra-sentence RE and we call them first-order relations. However, this methodology can sometimes fail to capture complex and long dependencies. To address this, we hypothesize that at times two target entities can be explicitly connected via a context token. We refer to such indirect relations as second-order relations and describe an efficient implementation for computing them. These second-order relation scores are then combined with first-order relation scores. Our empirical results show that the proposed method leads to state-of-the-art performance over two biomedical datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09271

PDF

http://arxiv.org/pdf/1902.09271
Read All
Bengali Handwritten Character Classification using Transfer Learning on Deep Convolutional Neural Network

2019-02-25

Swagato Chatterjee, Rwik Kumar Dutta, Debayan Ganguly, Kingshuk Chatterjee, Sudipta Roy

arXiv_CV

arXiv_CV CNN Transfer_Learning Classification Deep_Learning Recognition
Abstract

In this paper, we propose a solution which uses state-of-the-art techniques in Deep Learning to tackle the problem of Bengali Handwritten Character Recognition ( HCR ). Our method uses lesser iterations to train than most other comparable methods. We employ Transfer Learning on ResNet 50, a state-of-the-art deep Convolutional Neural Network Model, pretrained on ImageNet dataset. We also use other techniques like a modified version of One Cycle Policy, varying the input image sizes etc. to ensure that our training occurs fast. We use the BanglaLekha-Isolated Dataset for evaluation of our technique which consists of 84 classes (50 Basic, 10 Numerals and 24 Compound Characters). We are able to achieve 96.12% accuracy in just 47 epochs on BanglaLekha-Isolated dataset. When comparing our method with that of other researchers, considering number of classes and without using Ensemble Learning, the proposed solution achieves state of the art result for Handwritten Bengali Character Recognition. Code and weight files are available at https://github.com/swagato-c/bangla-hwcr-present.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11133

PDF

http://arxiv.org/html/1902.11133
Read All
Audio Caption: Listen and Tell

2019-02-25

Mengyue Wu, Heinrich Dinkel, Kai Yu

arXiv_CL

arXiv_CL Image_Caption Caption Classification Detection Relation
Abstract

Increasing amount of research has shed light on machine perception of audio events, most of which concerns detection and classification tasks. However, human-like perception of audio scenes involves not only detecting and classifying audio sounds, but also summarizing the relationship between different audio events. Comparable research such as image caption has been conducted, yet the audio field is still quite barren. This paper introduces a manually-annotated dataset for audio caption. The purpose is to automatically generate natural sentences for audio scene description and to bridge the gap between machine perception of audio and image. The whole dataset is labelled in Mandarin and we also include translated English annotations. A baseline encoder-decoder model is provided for both English and Mandarin. Similar BLEU scores are derived for both languages: our model can generate understandable and data-related captions based on the dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09254

PDF

http://arxiv.org/pdf/1902.09254
Read All
On constraint programming for a new flexible project scheduling problem with resource constraints

2019-02-25

Viktoria A. Hauder, Andreas Beham, Sebastian Raggl, Sophie N. Parragh, Michael Affenzeller

arXiv_AI

arXiv_AI
Abstract

Real-world project scheduling often requires flexibility in terms of the selection and the exact length of alternative production activities. Moreover, the simultaneous scheduling of multiple lots is mandatory in many production planning applications. To meet these requirements, a new flexible resource-constrained multi-project scheduling problem is introduced where both decisions (activity selection flexibility and time flexibility) are integrated. Besides the minimization of makespan, two alternative objectives inspired by a steel industry application case are presented: maximization of balanced length of selected activities (time balance) and maximization of balanced resource utilization (resource balance). New mixed integer and constraint programming (CP) models are proposed for the developed integrated flexible project scheduling problem. The real-world applicability of the suggested CP models is shown by solving large steel industry instances with the CP Optimizer of IBM ILOG CPLEX. Furthermore, benchmark instances on flexible resource-constrained project scheduling problems (RCPSP) are solved to optimality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09244

PDF

http://arxiv.org/pdf/1902.09244
Read All
Pretraining-Based Natural Language Generation for Text Summarization

2019-02-25

Haoyu Zhang, Yeyun Gong, Yu Yan, Nan Duan, Jianjun Xu, Ji Wang, Ming Gong, Ming Zhou

arXiv_AI

arXiv_AI Knowledge Summarization Text_Generation
Abstract

In this paper, we propose a novel pretraining-based encoder-decoder framework, which can generate the output sequence based on the input sequence in a two-stage manner. For the encoder of our model, we encode the input sequence into context representations using BERT. For the decoder, there are two stages in our model, in the first stage, we use a Transformer-based decoder to generate a draft output sequence. In the second stage, we mask each word of the draft sequence and feed it to BERT, then by combining the input sequence and the draft representation generated by BERT, we use a Transformer-based decoder to predict the refined word for each masked position. To the best of our knowledge, our approach is the first method which applies the BERT into text generation tasks. As the first step in this direction, we evaluate our proposed method on the text summarization task. Experimental results show that our model achieves new state-of-the-art on both CNN/Daily Mail and New York Times datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09243

PDF

http://arxiv.org/pdf/1902.09243
Read All
Robust Affordable 3D Haptic Sensation via Learning Deformation Patterns

2019-02-25

Huanbo Sun, Goerg Martius

arXiv_RO

arXiv_RO Sparse Face Inference
Abstract

Haptic sensation is an important modality for interacting with the real world. This paper proposes a general framework of inferring haptic forces on the surface of a 3D structure from internal deformations using a small number of physical sensors instead of employing dense sensor arrays. Using machine learning techniques, we optimize the sensor number and their placement and are able to obtain high-precision force inference for a robotic limb using as few as 9 sensors. For the optimal and sparse placement of the measurement units (strain gauges), we employ data-driven methods based on data obtained by finite element simulation. We compare data-driven approaches with model-based methods relying on geometric distance and information criteria such as Entropy and Mutual Information. We validate our approach on a modified limb of the Poppy robot and obtain 8 mm localization precision.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09241

PDF

http://arxiv.org/pdf/1902.09241
Read All
A Theoretical Analysis of Contrastive Unsupervised Representation Learning

2019-02-25

Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi

arXiv_AI

arXiv_AI Embedding Represenation_Learning Classification Language_Model
Abstract

Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically “similar” data points and “negative samples,” the learner forces the inner product of representations of similar pairs with each other to be higher on average than with negative samples. The current paper uses the term contrastive learning for such algorithms and presents a theoretical framework for analyzing them by introducing latent classes and hypothesizing that semantically similar points are sampled from the same latent class. This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes. Our generalization bound also shows that learned representations can reduce (labeled) sample complexity on downstream tasks. We conduct controlled experiments in both the text and image domains to support the theory.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09229

PDF

http://arxiv.org/pdf/1902.09229
Read All
Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation

2019-02-25

Soochan Lee, Junsoo Ha, Gunhee Kim

arXiv_CV

arXiv_CV Super_Resolution GAN Quantitative
Abstract

Recent advances in conditional image generation tasks, such as image-to-image translation and image inpainting, are largely accounted to the success of conditional GAN models, which are often optimized by the joint use of the GAN loss with the reconstruction loss. However, we reveal that this training recipe shared by almost all existing methods causes one critical side effect: lack of diversity in output samples. In order to accomplish both training stability and multimodal output generation, we propose novel training schemes with a new set of losses named moment reconstruction losses that simply replace the reconstruction loss. We show that our approach is applicable to any conditional generation tasks by performing thorough experiments on image-to-image translation, super-resolution and image inpainting using Cityscapes and CelebA dataset. Quantitative evaluations also confirm that our methods achieve a great diversity in outputs while retaining or even improving the visual fidelity of generated samples.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.09225

PDF

https://arxiv.org/pdf/1902.09225
Read All
Deep High-Resolution Representation Learning for Human Pose Estimation

2019-02-25

Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang

arXiv_CV

arXiv_CV Pose_Estimation Represenation_Learning Detection
Abstract

This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. The code and models have been publicly available at \url{https://github.com/leoxiaobin/deep-high-resolution-net.pytorch}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09212

PDF

http://arxiv.org/pdf/1902.09212
Read All
GMC: Grid Based Motion Clustering in Dynamic Environment

2019-02-25

Handuo Zhang, Karunasekera Hasith, Han Wang

arXiv_RO

arXiv_RO SLAM
Abstract

Conventional SLAM algorithms takes a strong assumption of scene motionlessness, which limits the application in real environments. This paper tries to tackle the challenging visual SLAM issue of moving objects in dynamic environments. We present GMC, grid-based motion clustering approach, a lightweight dynamic object filtering method that is free from high-power and expensive processors. GMC encapsulates motion consistency as the statistical likelihood of detected key points within a certain region. Using this method can we provide real-time and robust correspondence algorithm that can differentiate dynamic objects with static backgrounds. We evaluate our system in public TUM dataset. To compare with the state-of-the-art methods, our system can provide more accurate results by detecting dynamic objects.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09193

PDF

http://arxiv.org/pdf/1902.09193
Read All

145/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL