Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Practical Semantic Parsing for Spoken Language Understanding

2019-03-11

Marco Damonte, Rahul Goel, Tagyoung Chung

arXiv_CL

arXiv_CL Transfer_Learning
Abstract

Executable semantic parsing is the task of converting natural language utterances into logical forms that can be directly used as queries to get a response. We build a transfer learning framework for executable semantic parsing. We show that the framework is effective for Question Answering (Q&A) as well as for Spoken Language Understanding (SLU). We further investigate the case where a parser on a new domain can be learned by exploiting data on other domains, either via multi-task learning between the target domain and an auxiliary domain or via pre-training on the auxiliary domain and fine-tuning on the target domain. With either flavor of transfer learning, we are able to improve performance on most domains; we experiment with public data sets such as Overnight and NLmaps as well as with commercial SLU data. We report the first parsing results on Overnight and state-of-the-art results on NLmaps. The experiments carried out on data sets that are different in nature show how executable semantic parsing can unify different areas of NLP such as Q&A and SLU.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04521

PDF

http://arxiv.org/pdf/1903.04521
Read All
The Truth and Nothing but the Truth: Multimodal Analysis for Deception Detection

2019-03-11

Mimansa Jaiswal, Sairam Tabibu, Rajiv Bajpai

arXiv_CL

arXiv_CL Face Prediction Detection Recognition
Abstract

We propose a data-driven method for automatic deception detection in real-life trial data using visual and verbal cues. Using OpenFace with facial action unit recognition, we analyze the movement of facial features of the witness when posed with questions and the acoustic patterns using OpenSmile. We then perform a lexical analysis on the spoken words, emphasizing the use of pauses and utterance breaks, feeding that to a Support Vector Machine to test deceit or truth prediction. We then try out a method to incorporate utterance-based fusion of visual and lexical analysis, using string based matching.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04484

PDF

http://arxiv.org/pdf/1903.04484
Read All
Surrogate Scoring Rules and a Uniform Dominant Truth Serum

2019-03-11

Yang Liu, Yiling Chen

arXiv_AI

arXiv_AI Salient Knowledge Prediction
Abstract

Strictly proper scoring rules (SPSR) are widely used when designing incentive mechanisms to elicit private information from strategic agents using realized ground truth signals, and they can help quantify the value of elicited information. In this paper, we extend such scoring rules to settings where a mechanism designer does not have access to ground truth. We consider two such settings: (i) a setting when the mechanism designer has access to a noisy proxy version of the ground truth, with {\em known} biases; and (ii) the standard peer prediction setting where agents’ reports, and possibly some limited prior knowledge of ground truth, are the only source of information that the mechanism designer has. We introduce {\em surrogate scoring rules} (SSR) for the first setting, which use the noisy ground truth to evaluate quality of elicited information. We show that SSR preserves the strict properness of SPSR. Using SSR, we then develop a multi-task scoring mechanism – called \emph{uniform dominant truth serum} (DTS) – to achieve strict properness when there are sufficiently many tasks and agents, and when the mechanism designer only has access to agents’ reports and one bit information about the marginal of the entire set of tasks’ ground truth. In comparison to standard equilibrium concepts in peer prediction, we show that DTS can achieve truthfulness in \emph{uniform dominant strategy} in a multi-task setting when agents adopt the same strategy for all the tasks that they are assigned (hence the term uniform). A salient feature of SSR and DTS is that they quantify the quality of information despite lack of ground truth, just as proper scoring rules do for the {\em with} verification setting. Our method is verified both theoretically and empirically using data collected from real human participants.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.09158

PDF

http://arxiv.org/pdf/1802.09158
Read All
On intrinsic Stokes shift in wide GaN/AlGaN polar quantum wells

2019-03-11

M. Jarema, M. Gladysiewicz, E. Zdanowicz, E. Bellet-Amalric, E. Monroy, R. Kudrawiec

arXiv_CV

arXiv_CV GAN
Abstract

The interpretation of electromodulated reflectance (ER) spectra of polar quantum wells (QWs) is difficult even for homogeneous structures because of the built-in electric field. In this work we compare the room-temperature contactless ER and photoluminescence (PL) spectra of polar GaN/AlGaN QWs with the effective-mass band structure calculations. We show that the emission from the ground state transition is observed in PL but the ER is dominated by transitions between excited states. This effect results from the polarization-induced built-in electric field in QW that breaks the selection rules that apply to square-like QWs, allowing many optical transitions which cannot be separately distinguished in the ER spectrum. We develop the guidelines for the identification of optical transitions observed in PL and ER spectra. We conclude that an intrinsic Stokes shift, i.e., a shift between emission and absorption, is present even for homogeneous GaN/AlGaN QWs with large width, where the electron-hole wavefunction overlap for the fundamental transition is weak.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04481

PDF

https://arxiv.org/pdf/1903.04481
Read All
Video Generation from Single Semantic Label Map

2019-03-11

Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

arXiv_CV

arXiv_CV Prediction
Abstract

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process. Different from typical end-to-end approaches, which model both scene content and dynamics in a single step, we propose to decompose this difficult task into two sub-problems. As current image generation methods do better than video generation in terms of detail, we synthesize high quality content by only generating the first frame. Then we animate the scene based on its semantic meaning to obtain the temporally coherent video, giving us excellent results overall. We employ a cVAE for predicting optical flow as a beneficial intermediate step to generate a video sequence conditioned on the initial single frame. A semantic label map is integrated into the flow prediction module to achieve major improvements in the image-to-video generation process. Extensive experiments on the Cityscapes dataset show that our method outperforms all competing methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04480

PDF

https://arxiv.org/pdf/1903.04480
Read All
The Past and the Present of the Color Checker Dataset Misuse

2019-03-11

Nikola Banić, Karlo Koš{č}ević, Marko Subašić, Sven Lon{č}arić

arXiv_CV

arXiv_CV
Abstract

The pipelines of digital cameras contain a part for computational color constancy, which aims to remove the influence of the illumination on the scene colors. One of the best known and most widely used benchmark datasets for this problem is the Color Checker dataset. However, due to the improper handling of the black level in its images, this dataset has been widely misused and while some recent publications tried to alleviate the problem, they nevertheless erred and created additional wrong data. This paper gives a history of the Color Checker dataset usage, it describes the origins and reasons for its misuses, and it explains the old and new mistakes introduced in the most recent publications that tried to handle the issue. This should, hopefully, help to prevent similar future misuses.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04473

PDF

https://arxiv.org/pdf/1903.04473
Read All
Pragmatic inference and visual abstraction enable contextual flexibility during visual communication

2019-03-11

Judith Fan, Robert Hawkins, Mike Wu, Noah Goodman

arXiv_AI

arXiv_AI CNN Inference Recognition
Abstract

Visual modes of communication are ubiquitous in modern life. Here we investigate drawing, the most basic form of visual communication. Communicative drawing poses a core challenge for theories of how vision and social cognition interact, requiring a detailed understanding of how sensory information and social context jointly determine what information is relevant to communicate. Participants (N=192) were paired in an online environment to play a sketching-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher’s goal was to draw one of these objects - the target - so that the viewer could select it from the array. There were two types of trials: close, where objects belonged to the same basic-level category, and far, where objects belonged to different categories. We found that people exploited information in common ground with their partner to efficiently communicate about the target: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core competencies: (1) visual abstraction, the capacity to perceive the correspondence between an object and a drawing of it; and (2) pragmatic inference, the ability to infer what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both competencies, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants, providing an algorithmically explicit theory of how perception and social cognition jointly support contextual flexibility in visual communication.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04448

PDF

http://arxiv.org/pdf/1903.04448
Read All
On-line Adaptative Curriculum Learning for GANs

2019-03-11

Thang Doan, Joao Monteiro, Isabela Albuquerque, Bogdan Mazoure, Audrey Durand, Joelle Pineau, R Devon Hjelm

arXiv_CV

arXiv_CV Adversarial GAN Optimization
Abstract

Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage. Keywords: multiple discriminators, curriculum learning, multiple resolutions discriminators, multi-armed bandits, generative adversarial networks, smooth discriminators, multi-discriminator gan training, multiple experts.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1808.00020

PDF

https://arxiv.org/pdf/1808.00020
Read All
Physics Enhanced Artificial Intelligence

2019-03-11

Patrick O'Driscoll, Jaehoon Lee, Bo Fu

arXiv_AI

arXiv_AI
Abstract

We propose that intelligently combining models from the domains of Artificial Intelligence or Machine Learning with Physical and Expert models will yield a more “trustworthy” model than any one model from a single domain, given a complex and narrow enough problem. Based on mean-variance portfolio theory and bias-variance trade-off analysis, we prove combining models from various domains produces a model that has lower risk, increasing user trust. We call such combined models - physics enhanced artificial intelligence (PEAI), and suggest use cases for PEAI.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04442

PDF

http://arxiv.org/pdf/1903.04442
Read All
Mathematical foundations of matrix syntax

2019-03-11

Roman Orus, Roger Martin, Juan Uriagereka

arXiv_CL

arXiv_CL Relation
Abstract

Matrix syntax is a formal model of syntactic relations in language. The purpose of this paper is to explain its mathematical foundations, for an audience with some formal background. We make an axiomatic presentation, motivating each axiom on linguistic and practical grounds. The resulting mathematical structure resembles some aspects of quantum mechanics. Matrix syntax allows us to describe a number of language phenomena that are otherwise very difficult to explain, such as linguistic chains, and is arguably a more economical theory of language than most of the theories proposed in the context of the minimalist program in linguistics. In particular, sentences are naturally modelled as vectors in a Hilbert space with a tensor product structure, built from 2x2 matrices belonging to some specific group.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1710.00372

PDF

http://arxiv.org/pdf/1710.00372
Read All
ETNLP: A Toolkit for Extraction, Evaluation and Visualization of Pre-trained Word Embeddings

2019-03-11

Xuan-Son Vu, Thanh Vu, Son N. Tran, Lili Jiang

arXiv_CL

arXiv_CL Embedding Recognition
Abstract

In this paper, we introduce a comprehensive toolkit, ETNLP, which can evaluate, extract, and visualize multiple sets of pre-trained word embeddings. First, for evaluation, ETNLP analyses the quality of pre-trained embeddings based on an input word analogy list. Second, for extraction ETNLP provides a subset of the embeddings to be used in the downstream NLP tasks. Finally, ETNLP has a visualization module which is for exploring the embedded words interactively. We demonstrate the effectiveness of ETNLP on our pre-trained word embeddings in Vietnamese. Specifically, we create a large Vietnamese word analogy list to evaluate the embeddings. We then utilize the pre-trained embeddings for the name entity recognition (NER) task in Vietnamese and achieve the new state-of-the-art results on a benchmark dataset for the NER task. A video demonstration of ETNLP is available at this https URL. The source code and data are available at https: //github.com/vietnlp/etnlp.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04433

PDF

https://arxiv.org/pdf/1903.04433
Read All
Better-than-expert detection of early coronary artery occlusion from 12 lead electrocardiograms using deep learning

2019-03-11

Rob Brisk, Raymond R Bond. Dewar D Finlay, James McLaughlin, Alicja Piadlo, Stephen J Leslie, David E Gossman, Ian B A Menown, David J McEneaney

arXiv_AI

arXiv_AI Deep_Learning Detection
Abstract

Early diagnosis of acute coronary artery occlusion based on electrocardiogram (ECG) findings is essential for prompt delivery of primary percutaneous coronary intervention. Current ST elevation (STE) criteria are specific but insensitive. Consequently, it is likely that many patients are missing out on potentially life-saving treatment. Experts combining non-specific ECG changes with STE detect ischaemia with higher sensitivity, but at the cost of specificity. We show that a deep learning model can detect ischaemia caused by acute coronary artery occlusion with a better balance of sensitivity and specificity than STE criteria, existing computerised analysers or expert cardiologists.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04421

PDF

http://arxiv.org/pdf/1903.04421
Read All
Building an Affordances Map with Interactive Perception

2019-03-11

Leni K. Le Goff, Oussama Yaakoubi, Alexandre Coninx, Stephane Doncieux

arXiv_RO

arXiv_RO Classification
Abstract

Robots need to understand their environment to perform their task. If it is possible to pre-program a visual scene analysis process in closed environments, robots operating in an open environment would benefit from the ability to learn it through their interaction with their environment. This ability furthermore opens the way to the acquisition of affordances maps in which the action capabilities of the robot structure its visual scene understanding. We propose an approach to build such affordances maps by relying on an interactive perception approach and an online classification. In the proposed formalization of affordances, actions and effects are related to visual features, not objects, and they can be combined. We have tested the approach on three action primitives and on a real PR2 robot.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04413

PDF

https://arxiv.org/pdf/1903.04413
Read All
Stroke-based Artistic Rendering Agent with Deep Reinforcement Learning

2019-03-11

Zhewei Huang, Wen Heng, Shuchang Zhou

arXiv_CV

arXiv_CV Tracking Reinforcement_Learning
Abstract

Excellent painters can use only a few strokes to create a fantastic painting, which is a symbol of human intelligence and art. Reversing the simulator to interpret images is also a challenging task of computer vision in recent years. In this paper, we present SARA, a stroke-based artistic rendering agent that combines the neural renderer and deep reinforcement learning (DRL), allowing the machine to learn the ability to deconstruct images using strokes and create amazing visual effects. Our agent is an end-to-end program that converts natural images into paintings. The training process does not require the experience of human painting or stroke tracking data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04411

PDF

https://arxiv.org/pdf/1903.04411
Read All
Robot kinematic structure classification from time series of visual data

2019-03-11

Alberto Dalla Libera, Matteo Terzi, Alessandro Rossi, Gian Antonio Susto, Ruggero Carli

arXiv_RO

arXiv_RO Classification
Abstract

In this paper we present a novel algorithm to solve the robot kinematic structure identification problem. Given a time series of data, typically obtained processing a set of visual observations, the proposed approach identifies the ordered sequence of links associated to the kinematic chain, the joint type interconnecting each couple of consecutive links, and the input signal influencing the relative motion. Compared to the state of the art, the proposed algorithm has reduced computational costs, and is able to identify also the joints’ type sequence.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04410

PDF

https://arxiv.org/pdf/1903.04410
Read All
Evolutionary Cell Aided Design for Neural Network Architectures

2019-03-11

Philip Colangelo, Oren Segal, Alexander Speicher, Martin Margala

arXiv_CV

arXiv_CV
Abstract

Mathematical theory shows us that multilayer feedforward Artificial Neural Networks(ANNs) are universal function approximators, capable of approximating any measurable function to any desired degree of accuracy. In practice designing practical and efficient neural network architectures require significant effort and expertise. We present a new software framework called Evolutionary Cell Aided Design(ECAD) meant to aid in the exploration and design of Neural Network Architectures(NNAs) for reconfigurable hardware. The framework uses evolutionary algorithms to search for efficient hardware architectures. Given a general structure, a set of constraints and fitness functions, the framework will explore the space of hardware solutions and attempt to find the fittest solutions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.02130

PDF

https://arxiv.org/pdf/1903.02130
Read All
Accuracy Booster: Performance Boosting using Feature Map Re-calibration

2019-03-11

Pravendra Singh, Pratik Mazumder, Vinay P. Namboodiri

arXiv_CV

arXiv_CV Object_Detection CNN Classification Detection
Abstract

Convolution Neural Networks (CNN) have been extremely successful in solving intensive computer vision tasks. The convolutional filters used in CNNs have played a major role in this success, by extracting useful features from the inputs. Recently researchers have tried to boost the performance of CNNs by re-calibrating the feature maps produced by these filters, e.g., Squeeze-and-Excitation Networks (SENets). These approaches have achieved better performance by \textit{Exciting} up the important channels or feature maps while diminishing the rest. However, in the process, architectural complexity has increased. We propose an architectural block that introduces much lower complexity than the existing methods of CNN performance boosting while performing significantly better than them. We carry out experiments on the CIFAR, ImageNet and MS-COCO datasets, and show that the proposed block can challenge the state-of-the-art results. Our method boosts the ResNet-50 architecture to perform comparably to the ResNet-152 architecture, which is a three times deeper network, on classification. We also show experimentally that our method is not limited to classification but also generalizes well to other tasks such as object detection.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04407

PDF

https://arxiv.org/pdf/1903.04407
Read All
Applying Faster R-CNN for Object Detection on Malaria Images

2019-03-11

Jane Hung, Deepali Ravel, Stefanie C.P. Lopes, Gabriel Rangel, Odailton Amaral Nery, Benoit Malleret, Francois Nosten, Marcus V. G. Lacerda, Marcelo U. Ferreira, Laurent Rénia, Manoj T. Duraisingh, Fabio T. M. Costa, Matthias Marti, Anne E. Carpenter

arXiv_CV

arXiv_CV Object_Detection Segmentation GAN CNN Classification Deep_Learning Detection
Abstract

Deep learning based models have had great success in object detection, but the state of the art models have not yet been widely applied to biological image data. We apply for the first time an object detection model previously used on natural images to identify cells and recognize their stages in brightfield microscopy images of malaria-infected blood. Many micro-organisms like malaria parasites are still studied by expert manual inspection and hand counting. This type of object detection task is challenging due to factors like variations in cell shape, density, and color, and uncertainty of some cell classes. In addition, annotated data useful for training is scarce, and the class distribution is inherently highly imbalanced due to the dominance of uninfected red blood cells. We use Faster Region-based Convolutional Neural Network (Faster R-CNN), one of the top performing object detection models in recent years, pre-trained on ImageNet but fine tuned with our data, and compare it to a baseline, which is based on a traditional approach consisting of cell segmentation, extraction of several single-cell features, and classification using random forests. To conduct our initial study, we collect and label a dataset of 1300 fields of view consisting of around 100,000 individual cells. We demonstrate that Faster R-CNN outperforms our baseline and put the results in context of human performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.09548

PDF

http://arxiv.org/pdf/1804.09548
Read All
ADS-ME: Anomaly Detection System for Micro-expression Spotting

2019-03-11

Dawood Al Chanti, Alice Caplier

arXiv_CV

arXiv_CV CNN Detection
Abstract

Micro-expressions (MEs) are infrequent and uncontrollable facial events that can highlight emotional deception and appear in a high-stakes environment. This paper propose an algorithm for spatiotemporal MEs spotting. Since MEs are unusual events, we treat them as abnormal patterns that diverge from expected Normal Facial Behaviour (NFBs) patterns. NFBs correspond to facial muscle activations, eye blink/gaze events and mouth opening/closing movements that are all facial deformation but not MEs. We propose a probabilistic model to estimate the probability density function that models the spatiotemporal distributions of NFBs patterns. To rank the outputs, we compute the negative log-likelihood and we developed an adaptive thresholding technique to identify MEs from NFBs. While working only with NFBs data, the main challenge is to capture intrinsic spatiotemoral features, hence we design a recurrent convolutional autoencoder for feature representation. Finally, we show that our system is superior to previous works for MEs spotting.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04354

PDF

https://arxiv.org/pdf/1903.04354
Read All
Joint inference on structural and diffusion MRI for sequence-adaptive Bayesian segmentation of thalamic nuclei with probabilistic atlases

2019-03-11

Juan Eugenio Iglesias, Koen Van Leemput, Polina Golland, Anastasia Yendiki

arXiv_CV

arXiv_CV Segmentation Inference
Abstract

Segmentation of structural and diffusion MRI (sMRI/dMRI) is usually performed independently in neuroimaging pipelines. However, some brain structures (e.g., globus pallidus, thalamus and its nuclei) can be extracted more accurately by fusing the two modalities. Following the framework of Bayesian segmentation with probabilistic atlases and unsupervised appearance modeling, we present here a novel algorithm to jointly segment multi-modal sMRI/dMRI data. We propose a hierarchical likelihood term for the dMRI defined on the unit ball, which combines the Beta and Dimroth-Scheidegger-Watson distributions to model the data at each voxel. This term is integrated with a mixture of Gaussians for the sMRI data, such that the resulting joint unsupervised likelihood enables the analysis of multi-modal scans acquired with any type of MRI contrast, b-values, or number of directions, which enables wide applicability. We also propose an inference algorithm to estimate the maximum-a-posteriori model parameters from input images, and to compute the most likely segmentation. Using a recently published atlas derived from histology, we apply our method to thalamic nuclei segmentation on two datasets: HCP (state of the art) and ADNI (legacy) - producing lower sample sizes than Bayesian segmentation with sMRI alone.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04352

PDF

https://arxiv.org/pdf/1903.04352
Read All
Scaling in Words on Twitter

2019-03-11

Eszter Bokányi, Dániel Kondor, Gábor Vattay

arXiv_CL

arXiv_CL Relation
Abstract

Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the Metropolitan and Micropolitan Statistical Areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling. For both regimes we can offer a plausible explanation based on the meaning of the words. We also show that the parameters for Zipf’s law and Heaps law differ on Twitter from that of other texts, and that the exponent of Zipf’s law changes with city size.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04329

PDF

https://arxiv.org/pdf/1903.04329
Read All
Automated sequence and motion planning for robotic spatial extrusion of 3D trusses

2019-03-11

Yijiang Huang, Caelan Reed Garrett, Caitlin Tobin Mueller

arXiv_RO

arXiv_RO
Abstract

While robotic spatial extrusion has demonstrated a new and efficient means to fabricate 3D truss structures in architectural scale, a major challenge remains in automatically planning extrusion sequence and robotic motion for trusses with unconstrained topologies. This paper presents the first attempt in the field to rigorously formulate the extrusion sequence and motion planning (SAMP) problem, using a CSP encoding. Furthermore, this research proposes a new hierarchical planning framework to solve the extrusion SAMP problems that usually have a long planning horizon and 3D configuration complexity. By decoupling sequence and motion planning, the planning framework is able to efficiently solve the extrusion sequence, end-effector poses, joint configurations, and transition trajectories for spatial trusses with nonstandard topologies. This paper also presents the first detailed computation data to reveal the runtime bottleneck on solving SAMP problems, which provides insight and comparing baseline for future algorithmic development. Together with the algorithmic results, this paper also presents an open-source and modularized software implementation called Choreo that is machine-agnostic. To demonstrate the power of this algorithmic framework, three case studies, including real fabrication and simulation results, are presented.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.00998

PDF

http://arxiv.org/pdf/1810.00998
Read All
Multi-Representational Learning for Offline Signature Verification using Multi-Loss Snapshot Ensemble of CNNs

2019-03-11

Saeed Masoudnia, Omid Mersa, Babak N. Araabi, Abdol-Hossein Vahabie, Mohammad Amin Sadeghi, Majid Nili Ahmadabadi

arXiv_CV

arXiv_CV CNN Classification Recognition
Abstract

Offline Signature Verification (OSV) is a challenging pattern recognition task, especially in presence of skilled forgeries that are not available during training. This study aims to tackle its challenges and meet the substantial need for generalization for OSV by examining different loss functions for Convolutional Neural Network (CNN). We adopt our new approach to OSV by asking two questions: 1. which classification loss provides more generalization for feature learning in OSV? , and 2. How integration of different losses into a unified multi-loss function lead to an improved learning framework? These questions are studied based on analysis of three loss functions, including cross entropy, Cauchy-Schwarz divergence, and hinge loss. According to complementary features of these losses, we combine them into a dynamic multi-loss function and propose a novel ensemble framework for simultaneous use of them in CNN. Our proposed Multi-Loss Snapshot Ensemble (MLSE) consists of several sequential trials. In each trial, a dominant loss function is selected from the multi-loss set, and the remaining losses act as a regularizer. Different trials learn diverse representations for each input based on signature identification task. This multi-representation set is then employed for the verification task. An ensemble of SVMs is trained on these representations, and their decisions are finally combined according to the selection of most generalizable SVM for each user. We conducted two sets of experiments based on two different protocols of OSV, i.e., writer-dependent and writer-independent on three signature datasets: GPDS-Synthetic, MCYT, and UT-SIG. Based on the writer-dependent OSV protocol, we achieved substantial improvements over the best EERs in the literature. The results of the second set of experiments also confirmed the robustness to the arrival of new users enrolled in the OSV system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06536

PDF

http://arxiv.org/pdf/1903.06536
Read All
Reachability and Coverage Planning for Connected Agents: Extended Version

2019-03-11

Tristan Charrier, Arthur Queffelec, Ocan Sankur, François Schwarzentruber

arXiv_AI

arXiv_AI
Abstract

Motivated by the increasing appeal of robots in information-gathering missions, we study multi-agent path planning problems in which the agents must remain interconnected. We model an area by a topological graph specifying the movement and the connectivity constraints of the agents. We study the theoretical complexity of the reachability and the coverage problems of a fleet of connected agents on various classes of topological graphs. We establish the complexity of these problems on known classes, and introduce a new class called sight-moveable graphs which admit efficient algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04300

PDF

http://arxiv.org/pdf/1903.04300
Read All
Demonstration of Vector Flow Imaging using Convolutional Neural Networks

2019-03-11

Thomas Robins, Antonio Stanziola, Kai Reimer, Peter Weinberg, Meng-Xing Tang

arXiv_CV

arXiv_CV Knowledge CNN Classification Prediction Recognition
Abstract

Synthetic Aperture Vector Flow Imaging (SA-VFI) can visualize complex cardiac and vascular blood flow patterns at high temporal resolution with a large field of view. Convolutional neural networks (CNNs) are commonly used in image and video recognition and classification. However, most recently presented CNNs also allow for making per-pixel predictions as needed in optical flow velocimetry. To our knowledge we demonstrate here for the first time a CNN architecture to produce 2D full flow field predictions from high frame rate SA ultrasound images using supervised learning. The CNN was initially trained using CFD-generated and augmented noiseless SA ultrasound data of a realistic vessel geometry. Subsequently, a mix of noisy simulated and real \textit{in vivo} acquisitions were added to increase the network’s robustness. The resulting flow field of the CNN resembled the ground truth accurately with an endpoint-error percentage between 6.5\% to 14.5\%. Furthermore, when confronted with an unknown geometry of an arterial bifurcation, the CNN was able to predict an accurate flow field indicating its ability for generalization. Remarkably, the CNN also performed well for rotational flows, which usually requires advanced, computationally intensive VFI methods. We have demonstrated that convolutional neural networks can be used to estimate complex multidirectional flow.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06254

PDF

http://arxiv.org/pdf/1903.06254
Read All
Gradient Descent based Optimization Algorithms for Deep Learning Models Training

2019-03-11

Jiawei Zhang

arXiv_AI

arXiv_AI Optimization Deep_Learning Gradient_Descent
Abstract

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to train. Nowadays, most of the deep learning model training still relies on the back propagation algorithm actually. In back propagation, the model variables will be updated iteratively until convergence with gradient descent based optimization algorithms. Besides the conventional vanilla gradient descent algorithm, many gradient descent variants have also been proposed in recent years to improve the learning performance, including Momentum, Adagrad, Adam, Gadam, etc., which will all be introduced in this paper respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03614

PDF

http://arxiv.org/pdf/1903.03614
Read All
A Unified Formulation for Visual Odometry

2019-03-11

Georges Younes, Daniel Asmar, John Zelek

arXiv_CV

arXiv_CV Image_Caption Knowledge Tracking Optimization
Abstract

Monocular Odometry systems can be broadly categorized as being either Direct, Indirect, or a hybrid of both. While Indirect systems process an alternative image representation to compute geometric residuals, Direct methods process the image pixels directly to generate photometric residuals. Both paradigms have distinct but often complementary properties. This paper presents a Unified Formulation for Visual Odometry, referred to as UFVO, with the following key contributions: (1) a tight coupling of photometric (Direct) and geometric (Indirect) measurements using a joint multi-objective optimization, (2) the use of a utility function as a decision maker that incorporates prior knowledge on both paradigms, (3) descriptor sharing, where a feature can have more than one type of descriptor and its different descriptors are used for tracking and mapping, (4) the depth estimation of both corner features and pixel features within the same map using an inverse depth parametrization, and (5) a corner and pixel selection strategy that extracts both types of information, while promoting a uniform distribution over the image domain. Experiments show that our proposed system can handle large inter-frame motions, inherits the sub-pixel accuracy of direct methods, can run efficiently in real-time, can generate an Indirect map representation at a marginal computational cost when compared to traditional Indirect systems, all while outperforming state of the art in Direct, Indirect and hybrid systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04253

PDF

https://arxiv.org/pdf/1903.04253
Read All
Manifold Mixup improves text recognition with CTC loss

2019-03-11

Bastien Moysset, Ronaldo Messina

arXiv_CV

arXiv_CV RNN Classification Recognition
Abstract

Modern handwritten text recognition techniques employ deep recurrent neural networks. The use of these techniques is especially efficient when a large amount of annotated data is available for parameter estimation. Data augmentation can be used to enhance the performance of the systems when data is scarce. Manifold Mixup is a modern method of data augmentation that meld two images or the feature maps corresponding to these images and the targets are fused accordingly. We propose to apply the Manifold Mixup to text recognition while adapting it to work with a Connectionist Temporal Classification cost. We show that Manifold Mixup improves text recognition results on various languages and datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04246

PDF

https://arxiv.org/pdf/1903.04246
Read All
Adaptive Trajectory Planning and Optimization at Limits of Handling

2019-03-11

Lars Svensson, Monimoy Bujarbaruah, Nitin Kapania, Martin Törngren

arXiv_RO

arXiv_RO Tracking Optimization
Abstract

As deployment of automated vehicles increases, so does the rate at which they are exposed to critical traffic situations. Such situations, e.g. a late detected pedestrian in the vehicle path, require operation at the handling limits in order to maximize the capacity to avoid an accident. Also, the physical limitations of the vehicle typically vary in time due to local road and weather conditions. In this paper, we tackle the problem of trajectory planning and control at the limits of handling under time varying constraints, by adapting to local traction limitations. The proposed method is based on Real Time Iteration Sequential Quadratic Programming (RTI-SQP) augmented with state space sampling, which we call Sampling Augmented Adaptive RTI-SQP (SAA-SQP). Through extensive numerical simulations we demonstrate that our method increases the vehicle’s capacity to avoid late detected obstacles compared to the traditional planning/tracking approaches, as a direct consequence of safe operating constraint adaptation in real time.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04240

PDF

https://arxiv.org/pdf/1903.04240
Read All
Similarity Learning via Kernel Preserving Embedding

2019-03-11

Zhao Kang, Yiwei Lu, Yuanzhang Su, Changsheng Li, Zenglin Xu

arXiv_CV

arXiv_CV Sparse Embedding Relation
Abstract

Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04235

PDF

https://arxiv.org/pdf/1903.04235
Read All
Instance- and Category-level 6D Object Pose Estimation

2019-03-11

Caner Sahin, Guillermo Garcia-Hernando, Juil Sock, Tae-Kyun Kim

arXiv_CV

arXiv_CV Pose_Estimation CNN
Abstract

6D object pose estimation is an important task that determines the 3D position and 3D rotation of an object in camera-centred coordinates. By utilizing such a task, one can propose promising solutions for various problems related to scene understanding, augmented reality, control and navigation of robotics. Recent developments on visual depth sensors and low-cost availability of depth data significantly facilitate object pose estimation. Using depth information from RGB-D sensors, substantial progress has been made in the last decade by the methods addressing the challenges such as viewpoint variability, occlusion and clutter, and similar looking distractors. Particularly, with the recent advent of convolutional neural networks, RGB-only based solutions have been presented. However, improved results have only been reported for recovering the pose of known instances, i.e., for the instance-level object pose estimation tasks. More recently, state-of-the-art approaches target to solve object pose estimation problem at the level of categories, recovering the 6D pose of unknown instances. To this end, they address the challenges of the category-level tasks such as distribution shift among source and target domains, high intra-class variations, and shape discrepancies between objects.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04229

PDF

https://arxiv.org/pdf/1903.04229
Read All
Pluralistic Image Completion

2019-03-11

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

arXiv_CV

arXiv_CV Attention GAN Face Relation
Abstract

Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach for pluralistic image completion - the task of generating multiple and diverse plausible solutions for image completion. A major challenge faced by learning-based approaches is that usually only one ground truth training instance per label. As such, sampling from conditional VAEs still leads to minimal diversity. To overcome this, we propose a novel and probabilistically principled framework with two parallel paths. One is a reconstructive path that extends the VAE through a latent space that covers all partial images with different mask sizes, and imposes priors that adapt to the number of pixels. The other is a generative path for which the conditional prior is coupled to distributions obtained in the reconstructive path. Both are supported by GANs. We also introduce a new short+long term attention layer that exploits distant relations among decoder and encoder features, improving appearance consistency. When tested on datasets with buildings (Paris), faces (CelebAHQ), and natural images (ImageNet), our method not only generated higher-quality completion results, but also with multiple and diverse plausible outputs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04227

PDF

https://arxiv.org/pdf/1903.04227
Read All
Un duel probabiliste pour départager deux présidents

2019-03-11

Marc El-Bèze, Juan-Manuel Torres-Moreno, Frédéric Béchet

arXiv_CL

arXiv_CL Classification Detection
Abstract

We present a set of probabilistic models applied to binary classification as defined in the DEFT’05 challenge. The challenge consisted a mixture of two differents problems in Natural Language Processing : identification of author (a sequence of François Mitterrand’s sentences might have been inserted into a speech of Jacques Chirac) and thematic break detection (the subjects addressed by the two authors are supposed to be different). Markov chains, Bayes models and an adaptative process have been used to identify the paternity of these sequences. A probabilistic model of the internal coherence of speeches which has been employed to identify thematic breaks. Adding this model has shown to improve the quality results. A comparison with different approaches demostrates the superiority of a strategy that combines learning, coherence and adaptation. Applied to the DEFT’05 data test the results in terms of precision (0.890), recall (0.955) and Fscore (0.925) measure are very promising.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.07397

PDF

https://arxiv.org/pdf/1903.07397
Read All
Un duel probabiliste pour d'epartager deux pr'esidents

2019-03-11

Marc El-Bèze, Juan-Manuel Torres-Moreno, Frédéric Béchet

arXiv_CL

arXiv_CL Classification Detection
Abstract

We present a set of probabilistic models applied to binary classification as defined in the DEFT’05 challenge. The challenge consisted a mixture of two differents problems in Natural Language Processing : identification of author (a sequence of Fran\c{c}ois Mitterrand’s sentences might have been inserted into a speech of Jacques Chirac) and thematic break detection (the subjects addressed by the two authors are supposed to be different). Markov chains, Bayes models and an adaptative process have been used to identify the paternity of these sequences. A probabilistic model of the internal coherence of speeches which has been employed to identify thematic breaks. Adding this model has shown to improve the quality results. A comparison with different approaches demostrates the superiority of a strategy that combines learning, coherence and adaptation. Applied to the DEFT’05 data test the results in terms of precision (0.890), recall (0.955) and Fscore (0.925) measure are very promising.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07397

PDF

http://arxiv.org/pdf/1903.07397
Read All
Comparison Detector: Convolutional Neural Networks for Cervical Cell Detection

2019-03-11

Yixiong Liang, Zhihong Tang, Meng Yan, Jialin Chen, Yao Xiang

arXiv_CV

arXiv_CV Object_Detection Segmentation CNN Deep_Learning Detection
Abstract

Automated detection of cervical cancer cells or cell clumps has the potential to significantly reduce error and increase productivity in cervical cancer screening. However, most traditional methods rely on the success of accurate cell segmentation and discriminative hand-crafted features extraction. Recently there are emerging deep learning-based methods which train convolutional neural networks (CNN) to classify image patches, but they are computationally expensive. In this paper we propose to exploit contemporary object detection methods for cervical cancer detection. To deal with the limited size of training samples, we develop the comparison classifier into the state-of-the-art two-stage object detection method based on the comparison with the reference images of each category. In addition, we propose to learn the reference images of the background from the data instead of manually choosing them by some heuristic rules. This architecture, called the Comparison detector, shows significant improvement for small size dataset, achieving a mean Average Precision (mAP) 26.3% and an Average Recall (AR) 35.7%, both improving about 20 points compared to baseline model. Moreover, Comparison detector achieves same mAP performance as the current state-of-the-art model when training on the medium size dataset, and improves AR by 4 points. Our method is promising for the development of automation-assisted cervical cancer screening systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.05952

PDF

http://arxiv.org/pdf/1810.05952
Read All
Distributed deep learning for robust multi-site segmentation of CT imaging after traumatic brain injury

2019-03-11

Samuel Remedios, Snehashis Roy, Justin Blaber, Camilo Bermudez, Vishwesh Nath, Mayur B. Patel, John A. Butman, Bennett A. Landman, Dzung L. Pham

arXiv_CV

arXiv_CV Segmentation Deep_Learning Relation
Abstract

Machine learning models are becoming commonplace in the domain of medical imaging, and with these methods comes an ever-increasing need for more data. However, to preserve patient anonymity it is frequently impractical or prohibited to transfer protected health information (PHI) between institutions. Additionally, due to the nature of some studies, there may not be a large public dataset available on which to train models. To address this conundrum, we analyze the efficacy of transferring the model itself in lieu of data between different sites. By doing so we accomplish two goals: 1) the model gains access to training on a larger dataset that it could not normally obtain and 2) the model better generalizes, having trained on data from separate locations. In this paper, we implement multi-site learning with disparate datasets from the National Institutes of Health (NIH) and Vanderbilt University Medical Center (VUMC) without compromising PHI. Three neural networks are trained to convergence on a computed tomography (CT) brain hematoma segmentation task: one only with NIH data,one only with VUMC data, and one multi-site model alternating between NIH and VUMC data. Resultant lesion masks with the multi-site model attain an average Dice similarity coefficient of 0.64 and the automatically segmented hematoma volumes correlate to those done manually with a Pearson correlation coefficient of 0.87,corresponding to an 8% and 5% improvement, respectively, over the single-site model counterparts.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04207

PDF

https://arxiv.org/pdf/1903.04207
Read All
Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation

2019-03-11

Andrea Pilzer, Stéphane Lathuilière, Nicu Sebe, Elisa Ricci

arXiv_CV

arXiv_CV Knowledge Deep_Learning Prediction
Abstract

Nowadays, the majority of state of the art monocular depth estimation techniques are based on supervised deep learning models. However, collecting RGB images with associated depth maps is a very time consuming procedure. Therefore, recent works have proposed deep architectures for addressing the monocular depth prediction task as a reconstruction problem, thus avoiding the need of collecting ground-truth depth. Following these works, we propose a novel self-supervised deep model for estimating depth maps. Our framework exploits two main strategies: refinement via cycle-inconsistency and distillation. Specifically, first a \emph{student} network is trained to predict a disparity map such as to recover from a frame in a camera view the associated image in the opposite view. Then, a backward cycle network is applied to the generated image to re-synthesize back the input image, estimating the opposite disparity. A third network exploits the inconsistency between the original and the reconstructed input frame in order to output a refined depth map. Finally, knowledge distillation is exploited, such as to transfer information from the refinement network to the student. Our extensive experimental evaluation demonstrate the effectiveness of the proposed framework which outperforms state of the art unsupervised methods on the KITTI benchmark.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04202

PDF

https://arxiv.org/pdf/1903.04202
Read All
Structured Knowledge Distillation for Semantic Segmentation

2019-03-11

Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang

arXiv_CV

arXiv_CV Knowledge Segmentation GAN Image_Classification Semantic_Segmentation Classification Prediction
Abstract

In this paper, we investigate the knowledge distillation strategy for training small semantic segmentation networks by making use of large networks. We start from the straightforward scheme, pixel-wise distillation, which applies the distillation scheme adopted for image classification and performs knowledge distillation for each pixel~\emph{separately}. We further propose to distill the \emph{structured} knowledge from large networks to small networks, which is motivated by that semantic segmentation is a structured prediction problem. We study two structured distillation schemes: (i) \emph{pair-wise} distillation that distills the pairwise similarities, and (ii) \emph{holistic} distillation that uses GAN to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by extensive experiments on three scene parsing datasets: Cityscapes, Camvid and ADE20K.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04197

PDF

https://arxiv.org/pdf/1903.04197
Read All
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

2019-03-11

Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Value-based reinforcement-learning algorithms are currently state-of-the-art in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are currently limited by their need for an on-policy critic, which severely constraints how the critic is learned. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free actor-critic reinforcement-learning algorithm for continuous states and discrete actions, with off-policy critics. Off-policy critics are compatible with experience replay, ensuring high sample-efficiency, without the need for off-policy corrections. The actor, by slowly imitating the average greedy policy of the critics, leads to high-quality and state-specific exploration, which we show approximates Thompson sampling. Because the actor and critics are fully decoupled, BDPI is remarkably stable and, contrary to other state-of-the-art algorithms, unusually forgiving for poorly-configured hyper-parameters. BDPI is significantly more sample-efficient compared to Bootstrapped DQN, PPO, A3C and ACKTR, on a variety of tasks. Source code: https://github.com/vub-ai-lab/bdpi.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04193

PDF

http://arxiv.org/pdf/1903.04193
Read All
A cross-center smoothness prior for variational Bayesian brain tissue segmentation

2019-03-11

Wouter M. Kouw, Silas N. Ørting, Jens Petersen, Kim S. Pedersen, Marleen de Bruijne

arXiv_CV

arXiv_CV Segmentation Face
Abstract

Suppose one is faced with the challenge of tissue segmentation in MR images, without annotators at their center to provide labeled training data. One option is to go to another medical center for a trained classifier. Sadly, tissue classifiers do not generalize well across centers due to voxel intensity shifts caused by center-specific acquisition protocols. However, certain aspects of segmentations, such as spatial smoothness, remain relatively consistent and can be learned separately. Here we present a smoothness prior that is fit to segmentations produced at another medical center. This informative prior is presented to an unsupervised Bayesian model. The model clusters the voxel intensities, such that it produces segmentations that are similarly smooth to those of the other medical center. In addition, the unsupervised Bayesian model is extended to a semi-supervised variant, which needs no visual interpretation of clusters into tissues.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04191

PDF

https://arxiv.org/pdf/1903.04191
Read All
Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning

2019-03-11

Weipeng Huang, Xingyi Cheng, Kunlong Chen, Taifeng Wang, Wei Chu

arXiv_CL

arXiv_CL Knowledge Segmentation Optimization RNN
Abstract

The ambiguous annotation criteria bring into the divergence of Chinese Word Segmentation (CWS) datasets with various granularities. Multi-criteria learning leverage the annotation style of individual datasets and mine their common basic knowledge. In this paper, we proposed a domain adaptive segmenter to capture diverse criteria of datasets. Our model is based on Bidirectional Encoder Representations from Transformers (BERT), which is responsible for introducing external knowledge. We also optimize its computational efficiency via model pruning, quantization, and compiler optimization. Experiments show that our segmenter outperforms the previous results on 10 CWS datasets and is faster than the previous state-of-the-art Bi-LSTM-CRF model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04190

PDF

https://arxiv.org/pdf/1903.04190
Read All
Graph Data on the Web: extend the pivot, don't reinvent the wheel

2019-03-11

Fabien Gandon (Laboratoire I3S - SPARKS, WIMMICS, CRISAM), Franck Michel (WIMMICS), Olivier Corby (WIMMICS), Michel Buffa (WIMMICS), Andrea Tettamanzi (WIMMICS), Catherine Faron Zucker (WIMMICS), Elena Cabrio (WIMMICS), Serena Villata (WIMMICS)

arXiv_AI

arXiv_AI Knowledge Recommendation
Abstract

This article is a collective position paper from the Wimmics research team, expressing our vision of how Web graph data technologies should evolve in the future in order to ensure a high-level of interoperability between the many types of applications that produce and consume graph data. Wimmics stands for Web-Instrumented Man-Machine Interactions, Communities, and Semantics. We are a joint research team between INRIA Sophia Antipolis-M{'e}diterran{'e}e and I3S (CNRS and Universit{'e} C{\^o}te d’Azur). Our challenge is to bridge formal semantics and social semantics on the web. Our research areas are graph-oriented knowledge representation, reasoning and operationalization to model and support actors, actions and interactions in web-based epistemic communities. The application of our research is supporting and fostering interactions in online communities and management of their resources. In this position paper, we emphasize the need to extend the semantic Web standard stack to address and fulfill new graph data needs, as well as the importance of remaining compatible with existing recommendations, in particular the RDF stack, to avoid the painful duplication of models, languages, frameworks, etc. The following sections group motivations for different directions of work and collect reasons for the creation of a working group on RDF 2.0 and other recommendations of the RDF family.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04181

PDF

http://arxiv.org/pdf/1903.04181
Read All
Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals

2019-03-11

Patrick Schlosser, David Münch, Michael Arens

arXiv_CV

arXiv_CV Action_Recognition Recognition
Abstract

In this paper, a novel two-stream architecture for the task of temporal action proposal generation in long, untrimmed videos is presented. Inspired by the recent advances in the field of human action recognition utilizing 3D convolutions in combination with two-stream networks and based on the Single-Stream Temporal Action Proposals (SST) architecture, four different two-stream architectures utilizing sequences of images on one stream and images of optical flow on the other stream are subsequently investigated. The four architectures fuse the two separate streams at different depths in the model; for each of them, a broad range of parameters is investigated systematically as well as an optimal parametrization is empirically determined. The experiments on action and sports datasets show that all four two-stream architectures are able to outperform the original single-stream SST and achieve state of the art results. Additional experiments revealed that the improvements are not restricted to a single method of calculating optical flow by exchanging the formerly used method of Brox with FlowNet2 and still achieving improvements.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04176

PDF

https://arxiv.org/pdf/1903.04176
Read All
Partially Shuffling the Training Data to Improve Language Models

2019-03-11

Ofir Press

arXiv_CL

arXiv_CL Language_Model
Abstract

Although SGD requires shuffling the training data between epochs, currently none of the word-level language modeling systems do this. Naively shuffling all sentences in the training data would not permit the model to learn inter-sentence dependencies. Here we present a method that partially shuffles the training data between epochs. This method makes each batch random, while keeping most sentence ordering intact. It achieves new state of the art results on word-level language modeling on both the Penn Treebank and WikiText-2 datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04167

PDF

https://arxiv.org/pdf/1903.04167
Read All
HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing

2019-03-11

Wei Jiang, Yu Zhang, Zhenghua Li, Min Zhang

arXiv_CL

arXiv_CL Embedding Classification
Abstract

This paper describes a simple UCCA semantic graph parsing approach. The key idea is to convert a UCCA semantic graph into a constituent tree, in which extra labels are deliberately designed to mark remote and discontinuous links for future recovery. In this way, we can make use of existing syntactic parsing techniques. Based on the data statistics, we recover discontinuous links directly according to the output labels of the constituent parser and use a biaffine classification model to recover the more complex remote links. The classification model and the constituent parser are simultaneously trained under the multi-task learning framework. We use the multilingual BERT as extra features in the open tracks. Our system ranks the first place in the six English/German closed/open tracks among seven participating systems. For the seventh cross-lingual track, where there is little training data for French, we propose a language embedding approach to utilize English and German training data, and our result ranks the second place.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04153

PDF

https://arxiv.org/pdf/1903.04153
Read All
MSFD:Multi-Scale Receptive Field Face Detector

2019-03-11

Qiushan Guo, Yuan Dong, Yu Guo, Hongliang Bai

arXiv_CV

arXiv_CV Object_Detection Face CNN Inference Detection
Abstract

We aim to study the multi-scale receptive fields of a single convolutional neural network to detect faces of varied scales. This paper presents our Multi-Scale Receptive Field Face Detector (MSFD), which has superior performance on detecting faces at different scales and enjoys real-time inference speed. MSFD agglomerates context and texture by hierarchical structure. More additional information and rich receptive field bring significant improvement but generate marginal time consumption. We simultaneously propose an anchor assignment strategy which can cover faces with a wide range of scales to improve the recall rate of small faces and rotated faces. To reduce the false positive rate, we train our detector with focal loss which keeps the easy samples from overwhelming. As a result, MSFD reaches superior results on the FDDB, Pascal-Faces and WIDER FACE datasets, and can run at 31 FPS on GPU for VGA-resolution images.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04147

PDF

https://arxiv.org/pdf/1903.04147
Read All
An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network

2019-03-11

Amr Adel Helmy, Yasser M.K. Omar, Rania Hodhod

arXiv_CL

arXiv_CL Text_Classification Embedding CNN Classification
Abstract

Text classification plays a vital role today especially with the intensive use of social networking media. Recently, different architectures of convolutional neural networks have been used for text classification in which one-hot vector, and word embedding methods are commonly used. This paper presents a new language independent word encoding method for text classification. The proposed model converts raw text data to low-level feature dimension with minimal or no preprocessing steps by using a new approach called binary unique number of word “BUNOW”. BUNOW allows each unique word to have an integer ID in a dictionary that is represented as a k-dimensional vector of its binary equivalent. The output vector of this encoding is fed into a convolutional neural network (CNN) model for classification. Moreover, the proposed model reduces the neural network parameters, allows faster computation with few network layers, where a word is atomic representation the document as in word level, and decrease memory consumption for character level representation. The provided CNN model is able to work with other languages or multi-lingual text without the need for any changes in the encoding method. The model outperforms the character level and very deep character level CNNs models in terms of accuracy, network parameters, and memory consumption; the results show total classification accuracy 91.99% and error 8.01% using AG’s News dataset compared to the state of art methods that have total classification accuracy 91.45% and error 8.55%, in addition to the reduction in input feature vector and neural network parameters by 62% and 34%, respectively.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04146

PDF

https://arxiv.org/pdf/1903.04146
Read All
Deep Generative Models: Deterministic Prediction with an Application in Inverse Rendering

2019-03-11

Shima Kamyab, Rasool Sabzi, Zohreh Azimifar

arXiv_CV

arXiv_CV Prediction
Abstract

Deep generative models are stochastic neural networks capable of learning the distribution of data so as to generate new samples. Conditional Variational Autoencoder (CVAE) is a powerful deep generative model aiming at maximizing the lower bound of training data log-likelihood. In the CVAE structure, there is appropriate regularizer, which makes it applicable for suitably constraining the solution space in solving ill-posed problems and providing high generalization power. Considering the stochastic prediction characteristic in CVAE, depending on the problem at hand, it is desirable to be able to control the uncertainty in CVAE predictions. Therefore, in this paper we analyze the impact of CVAE’s condition on the diversity of solutions given by our designed CVAE in 3D shape inverse rendering as a prediction problem. The experimental results using Modelnet10 and Shapenet datasets show the appropriate performance of our designed CVAE and verify the hypothesis: \emph{“The more informative the conditions in terms of object pose are, the less diverse the CVAE predictions are}”.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04144

PDF

https://arxiv.org/pdf/1903.04144
Read All
The Unconstrained Ear Recognition Challenge 2019

2019-03-11

Žiga Emeršič, Aruna Kumar S. V., B. S. Harish, Weronika Gutfeter, Jalil Nourmohammadi Khiarak, Andrzej Pacut, Earnest Hansley, Mauricio Pamplona Segundo, Sudeep Sarkar, Hyeonjung Park, Gi Pyo Nam, Ig-Jae Kim, Sagar G. Sangodkar, Ümit Kaçar, Murvet Kirci, Li Yuan, Jishou Yuan, Haonan Zhao, Fei Lu, Junying Mao, Xiaoshuang Zhang, Dogucan Yaman, Fevziye Irem Eyiokur, Kadir Bulut Özler, Hazım Kemal Ekenel, Debbrota Paul Chowdhury, Sambit Bakshi, Banshidhar Majhi, Peter Peer, Vitomir Štruc

arXiv_CV

arXiv_CV Deep_Learning Recognition
Abstract

This paper presents a summary of the 2019 Unconstrained Ear Recognition Challenge (UERC), the second in a series of group benchmarking efforts centered around the problem of person recognition from ear images captured in uncontrolled settings. The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i.e. gender and ethnicity. Research groups from 12 institutions entered the competition and submitted a total of 13 recognition approaches ranging from descriptor-based methods to deep-learning models. The majority of submissions focused on deep learning approaches and hybrid techniques combining hand-crafted and learned image descriptors. Our analysis shows that hybrid and deep-learning-based approaches significantly outperform traditional hand-crafted approaches. We argue that this is a good indicator of where ear recognition will be heading in the future. Furthermore, the results in general improve upon the UERC 2017 and display the steady advancement of the ear recognition.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04143

PDF

https://arxiv.org/pdf/1903.04143
Read All
HG-DAgger: Interactive Imitation Learning with Human Experts

2019-03-11

Michael Kelly, Chelsea Sidrane, Katherine Driggs-Campbell, Mykel J. Kochenderfer

arXiv_RO

arXiv_RO
Abstract

Imitation learning has proven to be useful for many real-world problems, but approaches such as behavioral cloning suffer from data mismatch and compounding error issues. One attempt to address these limitations is the DAgger algorithm, which uses the state distribution induced by the novice to sample corrective actions from the expert. Such sampling schemes, however, require the expert to provide action labels without being fully in control of the system. This can decrease safety and, when using humans as experts, is likely to degrade the quality of the collected labels due to perceived actuator lag. In this work, we propose HG-DAgger, a variant of DAgger that is more suitable for interactive imitation learning from human experts in real-world systems. In addition to training a novice policy, HG-DAgger also learns a safety threshold for a model-uncertainty-based risk metric that can be used to predict the performance of the fully trained novice in different regions of the state space. We evaluate our method on both a simulated and real-world autonomous driving task, and demonstrate improved performance over both DAgger and behavioral cloning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.02890

PDF

http://arxiv.org/pdf/1810.02890
Read All

125/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL