Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

SCALP: Superpixels with Contour Adherence using Linear Path

2019-03-17

Rémi Giraud, Vinh-Thong Ta, Nicolas Papadakis

arXiv_CV

arXiv_CV Segmentation Detection
Abstract

Superpixel decomposition methods are generally used as a pre-processing step to speed up image processing tasks. They group the pixels of an image into homogeneous regions while trying to respect existing contours. For all state-of-the-art superpixel decomposition methods, a trade-off is made between 1) computational time, 2) adherence to image contours and 3) regularity and compactness of the decomposition. In this paper, we propose a fast method to compute Superpixels with Contour Adherence using Linear Path (SCALP) in an iterative clustering framework. The distance computed when trying to associate a pixel to a superpixel during the clustering is enhanced by considering the linear path to the superpixel barycenter. The proposed framework produces regular and compact superpixels that adhere to the image contours. We provide a detailed evaluation of SCALP on the standard Berkeley Segmentation Dataset. The obtained results outperform state-of-the-art methods in terms of standard superpixel and contour detection metrics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07149

PDF

http://arxiv.org/pdf/1903.07149
Read All
Robust Shape Regularity Criteria for Superpixel Evaluation

2019-03-17

Rémi Giraud, Vinh-Thong Ta, Nicolas Papadakis

arXiv_CV

arXiv_CV Tracking Recognition
Abstract

Regular decompositions are necessary for most superpixel-based object recognition or tracking applications. So far in the literature, the regularity or compactness of a superpixel shape is mainly measured by its circularity. In this work, we first demonstrate that such measure is not adapted for superpixel evaluation, since it does not directly express regularity but circular appearance. Then, we propose a new metric that considers several shape regularity aspects: convexity, balanced repartition, and contour smoothness. Finally, we demonstrate that our measure is robust to scale and noise and enables to more relevantly compare superpixel methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07146

PDF

http://arxiv.org/pdf/1903.07146
Read All
Inverse Path Tracing for Joint Material and Lighting Estimation

2019-03-17

Dejan Azinović, Tzu-Mao Li, Anton Kaplanyan, Matthias Nießner

arXiv_CV

arXiv_CV Optimization Gradient_Descent
Abstract

Modern computer vision algorithms have brought significant advancement to 3D geometry reconstruction. However, illumination and material reconstruction remain less studied, with current approaches assuming very simplified models for materials and illumination. We introduce Inverse Path Tracing, a novel approach to jointly estimate the material properties of objects and light sources in indoor scenes by using an invertible light transport simulation. We assume a coarse geometry scan, along with corresponding images and camera poses. The key contribution of this work is an accurate and simultaneous retrieval of light sources and physically based material properties (e.g., diffuse reflectance, specular reflectance, roughness, etc.) for the purpose of editing and re-rendering the scene under new conditions. To this end, we introduce a novel optimization method using a differentiable Monte Carlo renderer that computes derivatives with respect to the estimated unknown illumination and material properties. This enables joint optimization for physically correct light transport and material models using a tailored stochastic gradient descent.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07145

PDF

http://arxiv.org/pdf/1903.07145
Read All
Topic-Guided Variational Autoencoders for Text Generation

2019-03-17

Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen, Lawrence Carin

arXiv_CL

arXiv_CL Text_Generation Inference
Abstract

We propose a topic-guided variational autoencoder (TGVAE) model for text generation. Distinct from existing variational autoencoder (VAE) based approaches, which assume a simple Gaussian prior for the latent code, our model specifies the prior as a Gaussian mixture model (GMM) parametrized by a neural topic module. Each mixture component corresponds to a latent topic, which provides guidance to generate sentences under the topic. The neural topic module and the VAE-based neural sequence module in our model are learned jointly. In particular, a sequence of invertible Householder transformations is applied to endow the approximate posterior of the latent code with high flexibility during model inference. Experimental results show that our TGVAE outperforms alternative approaches on both unconditional and conditional text generation, which can generate semantically-meaningful sentences with various topics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07137

PDF

http://arxiv.org/pdf/1903.07137
Read All
Question Answering via Web Extracted Tables and Pipelined Models

2019-03-17

Bhavya Karki, Fan Hu, Nithin Haridas Suhail Barot, Zihua Liu, Lucile Callebert, Matthias Grabmair, Anthony Tomasic

arXiv_CL

arXiv_CL QA Classification
Abstract

In this paper, we describe a dataset and baseline result for a question answering that utilizes web tables. It contains commonly asked questions on the web and their corresponding answers found in tables on websites. Our dataset is novel in that every question is paired with a table of a different signature. In particular, the dataset contains two classes of tables: entity-instance tables and the key-value tables. Each QA instance comprises a table of either kind, a natural language question, and a corresponding structured SQL query. We build our model by dividing question answering into several tasks, including table retrieval and question element classification, and conduct experiments to measure the performance of each task. We extract various features specific to each task and compose a full pipeline which constructs the SQL query from its parts. Our work provides qualitative results and error analysis for each task, and identifies in detail the reasoning required to generate SQL expressions from natural language questions. This analysis of reasoning informs future models based on neural machine learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07113

PDF

http://arxiv.org/pdf/1903.07113
Read All
Adaptive Genomic Evolution of Neural Network Topologies for State-to-Action Mapping in Autonomous Agents

2019-03-17

Amir Behjat, Sharat Chidambaran, Souma Chowdhury

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Neuroevolution is a process of training neural networks (NN) through an evolutionary algorithm, usually to serve as a state-to-action mapping model in control or reinforcement learning-type problems. This paper builds on the Neuro Evolution of Augmented Topologies (NEAT) formalism that allows designing topology and weight evolving NNs. Fundamental advancements are made to the neuroevolution process to address premature stagnation and convergence issues, central among which is the incorporation of automated mechanisms to control the population diversity and average fitness improvement within the neuroevolution process. Insights into the performance and efficiency of the new algorithm is obtained by evaluating it on three benchmark problems from the Open AI platform and an Unmanned Aerial Vehicle (UAV) collision avoidance problem.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07107

PDF

http://arxiv.org/pdf/1903.07107
Read All
The Missing Ingredient in Zero-Shot Neural Machine Translation

2019-03-17

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, Wolfgang Macherey

arXiv_AI

arXiv_AI NMT
Abstract

Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07091

PDF

http://arxiv.org/pdf/1903.07091
Read All
A Weighted Multi-Criteria Decision Making Approach for Image Captioning

2019-03-17

Hassan Maleki Galandouz, Mohsen Ebrahimi Moghaddam, Mehrnoush Shamsfard

arXiv_CV

arXiv_CV Image_Caption Attention Caption
Abstract

Image captioning aims at automatically generating descriptions of an image in natural language. This is a challenging problem in the field of artificial intelligence that has recently received significant attention in the computer vision and natural language processing. Among the existing approaches, visual retrieval based methods have been proven to be highly effective. These approaches search for similar images, then build a caption for the query image based on the captions of the retrieved images. In this study, we present a method for visual retrieval based image captioning, in which we use a multi criteria decision making algorithm to effectively combine several criteria with proportional impact weights to retrieve the most relevant caption for the query image. The main idea of the proposed approach is to design a mechanism to retrieve more semantically relevant captions with the query image and then selecting the most appropriate caption by imitation of the human act based on a weighted multi-criteria decision making algorithm. Experiments conducted on MS COCO benchmark dataset have shown that proposed method provides much more effective results in compare to the state-of-the-art models by using criteria with proportional impact weights .

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00766

PDF

http://arxiv.org/pdf/1904.00766
Read All
STNReID : Deep Convolutional Networks with Pairwise Spatial Transformer Networks for Partial Person Re-identification

2019-03-17

Hao Luo, Xing Fan, Chi Zhang, Wei Jiang

arXiv_CV

arXiv_CV Re-identification Person_Re-identification CNN Deep_Learning
Abstract

Partial person re-identification (ReID) is a challenging task because only partial information of person images is available for matching target persons. Few studies, especially on deep learning, have focused on matching partial person images with holistic person images. This study presents a novel deep partial ReID framework based on pairwise spatial transformer networks (STNReID), which can be trained on existing holistic person datasets. STNReID includes a spatial transformer network (STN) module and a ReID module. The STN module samples an affined image (a semantically corresponding patch) from the holistic image to match the partial image. The ReID module extracts the features of the holistic, partial, and affined images. Competition (or confrontation) is observed between the STN module and the ReID module, and two-stage training is applied to acquire a strong STNReID for partial ReID. Experimental results show that our STNReID obtains 66.7% and 54.6% rank-1 accuracies on partial ReID and partial iLIDS datasets, respectively. These values are at par with those obtained with state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07072

PDF

http://arxiv.org/pdf/1903.07072
Read All
Bags of Tricks and A Strong Baseline for Deep Person Re-identification

2019-03-17

Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, Wei Jiang

arXiv_CV

arXiv_CV Re-identification Person_Re-identification
Abstract

This paper explores a simple and efficient baseline for person re-identification (ReID). Person re-identification (ReID) with deep neural networks has made progress and achieved high performance in recent years. However, many state-of-the-arts methods design complex network structure and concatenate multi-branch features. In the literature, some effective training tricks are briefly appeared in several papers or source codes. This paper will collect and evaluate these effective training tricks in person ReID. By combining these tricks together, the model achieves 94.5% rank-1 and 85.9% mAP on Market1501 with only using global features. Our codes and models are available in Github.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07071

PDF

http://arxiv.org/pdf/1903.07071
Read All
Spatiotemporal Filtering for Event-Based Action Recognition

2019-03-17

Rohan Ghosh, Anupam Gupta, Andrei Nakagawa, Alcimar Soares, Nitish Thakor

arXiv_CV

arXiv_CV Sparse Action_Recognition CNN Recognition
Abstract

In this paper, we address the challenging problem of action recognition, using event-based cameras. To recognise most gestural actions, often higher temporal precision is required for sampling visual information. Actions are defined by motion, and therefore, when using event-based cameras it is often unnecessary to re-sample the entire scene. Neuromorphic, event-based cameras have presented an alternative to visual information acquisition by asynchronously time-encoding pixel intensity changes, through temporally precise spikes (10 micro-second resolution), making them well equipped for action recognition. However, other challenges exist, which are intrinsic to event-based imagers, such as higher signal-to-noise ratio, and a spatiotemporally sparse information. One option is to convert event-data into frames, but this could result in significant temporal precision loss. In this work we introduce spatiotemporal filtering in the spike-event domain, as an alternative way of channeling spatiotemporal information through to a convolutional neural network. The filters are local spatiotemporal weight matrices, learned from the spike-event data, in an unsupervised manner. We find that appropriate spatiotemporal filtering significantly improves CNN performance beyond state-of-the-art on the event-based DVS Gesture dataset. On our newly recorded action recognition dataset, our method shows significant improvement when compared with other, standard ways of generating the spatiotemporal filters.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07067

PDF

http://arxiv.org/pdf/1903.07067
Read All
AdaGraph: Unifying Predictive and ContinuousDomain Adaptation through Graphs

2019-03-17

Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci

arXiv_CV

arXiv_CV
Abstract

The ability to categorize is a cornerstone of visual intelligence, and a key functionality for artificial, autonomous visual machines. This problem will never be solved without algorithms able to adapt and generalize across visual domains. Within the context of domain adaptation and generalization, this paper focuses on the predictive domain adaptation scenario, namely the case where no target data are available and the system has to learn to generalize from annotated source images plus unlabeled samples with associated metadata from auxiliary domains. Our contributionis the first deep architecture that tackles predictive domainadaptation, able to leverage over the information broughtby the auxiliary domains through a graph. Moreover, we present a simple yet effective strategy that allows us to take advantage of the incoming target data at test time, in a continuous domain adaptation scenario. Experiments on three benchmark databases support the value of our approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07062

PDF

http://arxiv.org/pdf/1903.07062
Read All
Discriminating Original Region from Duplicated One in Copy-Move Forgery

2019-03-17

Saba Salehi, Ahmad Mahmoodi-Aznaveh

arXiv_CV

arXiv_CV Detection
Abstract

Since images are used as evidence in many cases, validation of digital images is essential. Copy-move forgery is a special kind of manipulation in which some parts of an image is copied and pasted into another part of the same image. Various methods have been proposed to detect copy-move forgery, which have achieved promising results. In previous methods, a binary mask determining the original and forged region is presented as the final result. However, it is not specified which part of the mask is the forged region. It should be noted that discriminating the original region from the duplicated one is not usually feasible by human visual system(HVS). On the other hand, exact localizing the forged region can be helpful for automatic forgery detection especially in combined forgeries. In real-world forgeries, some manipulations are performed in order to provide a visibly realistic scene. These modifications are usually applied on the boundary of the duplicated snippets. In this research, the texture information of the border regions of both the original and copied patches have been statistically investigated. Based on this analysis, we propose a method to discriminated copied snippets from original ones. In order to validate our method, GRIP dataset is utilized since it contains more realistic forged images which are not easily recognizable by HVS.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07044

PDF

http://arxiv.org/pdf/1903.07044
Read All
Audio De-identification: A New Entity Recognition Task

2019-03-17

Ido Cohn, Itay Laish, Genady Beryozkin, Gang Li, Izhak Shafran, Idan Szpektor, Tzvika Hartman, Avinatan Hassidim, Yossi Matias

arXiv_CL

arXiv_CL Speech_Recognition Recognition
Abstract

Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline’s results on it.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07037

PDF

http://arxiv.org/pdf/1903.07037
Read All
Integrative Analysis of Patient Health Records and Neuroimages via Memory-based Graph Convolutional Network

2019-03-17

Xi Sheryl Zhang, Jingyuan Chou, Fei Wang

arXiv_CV

arXiv_CV CNN Classification
Abstract

With the arrival of the big data era, more and more data are becoming readily available in various real-world applications and those data are usually highly heterogeneous. Taking computational medicine as an example, we have both Electronic Health Records (EHR) and medical images for each patient. For complicated diseases such as Parkinson’s and Alzheimer’s, both EHR and neuroimaging information are very important for disease understanding because they contain complementary aspects of the disease. However, EHR and neuroimage are completely different. So far the existing research has been mainly focusing on one of them. In this paper, we proposed a framework, Memory-Based Graph Convolution Network (MemGCN), to perform integrative analysis with such multi-modal data. Specifically, GCN is used to extract useful information from the patients’ neuroimages. The information contained in the patient EHRs before the acquisition of each brain image is captured by a memory network because of its sequential nature. The information contained in each brain image is combined with the information read out from the memory network to infer the disease state at the image acquisition timestamp. To further enhance the analytical power of MemGCN, we also designed a multi-hop strategy that allows multiple reading and updating on the memory can be performed at each iteration. We conduct experiments using the patient data from the Parkinson’s Progression Markers Initiative (PPMI) with the task of classification of Parkinson’s Disease (PD) cases versus controls. We demonstrate that superior classification performance can be achieved with our proposed framework, comparing with existing approaches involving a single type of data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1809.06018

PDF

https://arxiv.org/pdf/1809.06018
Read All
Reconstructing neuronal anatomy from whole-brain images

2019-03-17

James Gornet, Kannan Umadevi Venkataraju, Arun Narasimhan, Nicholas Turner, Kisuk Lee, H. Sebastian Seung, Pavel Osten, Uygar Sümbül

arXiv_CV

arXiv_CV GAN
Abstract

Reconstructing multiple molecularly defined neurons from individual brains and across multiple brain regions can reveal organizational principles of the nervous system. However, high resolution imaging of the whole brain is a technically challenging and slow process. Recently, oblique light sheet microscopy has emerged as a rapid imaging method that can provide whole brain fluorescence microscopy at a voxel size of 0.4 by 0.4 by 2.5 cubic microns. On the other hand, complex image artifacts due to whole-brain coverage produce apparent discontinuities in neuronal arbors. Here, we present connectivity-preserving methods and data augmentation strategies for supervised learning of neuroanatomy from light microscopy using neural networks. We quantify the merit of our approach by implementing an end-to-end automated tracing pipeline. Lastly, we demonstrate a scalable, distributed implementation that can reconstruct the large datasets that sub-micron whole-brain images produce.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07027

PDF

http://arxiv.org/pdf/1903.07027
Read All
Responses to a Critique of Artificial Moral Agents

2019-03-17

Adam Poulsen, Michael Anderson, Susan L. Anderson, Ben Byford, Fabio Fossa, Erica L. Neely, Alejandro Rosas, Alan Winfield

arXiv_AI

arXiv_AI
Abstract

The field of machine ethics is concerned with the question of how to embed ethical behaviors, or a means to determine ethical behaviors, into artificial intelligence (AI) systems. The goal is to produce artificial moral agents (AMAs) that are either implicitly ethical (designed to avoid unethical consequences) or explicitly ethical (designed to behave ethically). Van Wynsberghe and Robbins’ (2018) paper Critiquing the Reasons for Making Artificial Moral Agents critically addresses the reasons offered by machine ethicists for pursuing AMA research; this paper, co-authored by machine ethicists and commentators, aims to contribute to the machine ethics conversation by responding to that critique. The reasons for developing AMAs discussed in van Wynsberghe and Robbins (2018) are: it is inevitable that they will be developed; the prevention of harm; the necessity for public trust; the prevention of immoral use; such machines are better moral reasoners than humans, and building these machines would lead to a better understanding of human morality. In this paper, each co-author addresses those reasons in turn. In so doing, this paper demonstrates that the reasons critiqued are not shared by all co-authors; each machine ethicist has their own reasons for researching AMAs. But while we express a diverse range of views on each of the six reasons in van Wynsberghe and Robbins’ critique, we nevertheless share the opinion that the scientific study of AMAs has considerable value.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07021

PDF

http://arxiv.org/pdf/1903.07021
Read All
Patch Clustering for Representation of Histopathology Images

2019-03-17

Wafa Chenni, Habib Herbi, Morteza Babaie, H.R.Tizhoosh

arXiv_CV

arXiv_CV GAN
Abstract

Whole Slide Imaging (WSI) has become an important topic during the last decade. Even though significant progress in both medical image processing and computational resources has been achieved, there are still problems in WSI that need to be solved. A major challenge is the scan size. The dimensions of digitized tissue samples may exceed 100,000 by 100,000 pixels causing memory and efficiency obstacles for real-time processing. The main contribution of this work is representing a WSI by selecting a small number of patches for algorithmic processing (e.g., indexing and search). As a result, we reduced the search time and storage by various factors between ($50\% - 90\%$), while losing only a few percentages in the patch retrieval accuracy. A self-organizing map (SOM) has been applied on local binary patterns (LBP) and deep features of the KimiaPath24 dataset in order to cluster patches that share the same characteristics. We used a Gaussian mixture model (GMM) to represent each class with a rather small ($10\%-50\%$) portion of patches. The results showed that LBP features can outperform deep features. By selecting only $50\%$ of all patches after SOM clustering and GMM patch selection, we received $65\%$ accuracy for retrieval of the best match, while the maximum accuracy (using all patches) was $69\%$.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07013

PDF

http://arxiv.org/pdf/1903.07013
Read All
Dual Encoding for Zero-Example Video Retrieval

2019-03-17

Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang

arXiv_CV

arXiv_CV
Abstract

This paper attacks the challenging problem of zero-example video retrieval. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described in natural language text with no visual example provided. Given videos as sequences of frames and queries as sequences of words, an effective sequence-to-sequence cross-modal matching is required. The majority of existing methods are concept based, extracting relevant concepts from queries and videos and accordingly establishing associations between the two modalities. In contrast, this paper takes a concept-free approach, proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own. Dual encoding is conceptually simple, practically effective and end-to-end. As experiments on three benchmarks, i.e. MSR-VTT, TRECVID 2016 and 2017 Ad-hoc Video Search show, the proposed solution establishes a new state-of-the-art for zero-example video retrieval.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.06181

PDF

http://arxiv.org/pdf/1809.06181
Read All
Deep Features for Tissue-Fold Detection in Histopathology Images

2019-03-17

Morteza Babaie, H.R. Tizhoosh

arXiv_CV

arXiv_CV CNN Detection
Abstract

Whole slide imaging (WSI) refers to the digitization of a tissue specimen which enables pathologists to explore high-resolution images on a monitor rather than through a microscope. The formation of tissue folds occur during tissue processing. Their presence may not only cause out-of-focus digitization but can also negatively affect the diagnosis in some cases. In this paper, we have compared five pre-trained convolutional neural networks (CNNs) of different depths as feature extractors to characterize tissue folds. We have also explored common classifiers to discriminate folded tissue against the normal tissue in hematoxylin and eosin (H\&E) stained biopsy samples. In our experiments, we manually select the folded area in roughly 2.5mm $\times$ 2.5mm patches at $20$x magnification level as the training data. The ``DenseNet’’ with 201 layers alongside an SVM classifier outperformed all other configurations. Based on the leave-one-out validation strategy, we achieved $96.3\%$ accuracy, whereas with augmentation the accuracy increased to $97.2\%$. We have tested the generalization of our method with five unseen WSIs from the NIH (National Cancer Institute) dataset. The accuracy for patch-wise detection was $81\%$. One folded patch within an image suffices to flag the entire specimen for visual inspection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07011

PDF

http://arxiv.org/pdf/1903.07011
Read All
Leveling the Playing Field - Fairness in AI Versus Human Game Benchmarks

2019-03-17

Rodrigo Canaan, Christoph Salge, Julian Togelius, Andy Nealen

arXiv_AI

arXiv_AI Review
Abstract

From the beginning if the history of AI, there has been interest in games as a platform of research. As the field developed, human-level competence in complex games became a target researchers worked to reach. Only relatively recently has this target been finally met for traditional tabletop games such as Backgammon, Chess and Go. Current research focus has shifted to electronic games, which provide unique challenges. As is often the case with AI research, these results are liable to be exaggerated or misrepresented by either authors or third parties. The extent to which these games benchmark consist of fair competition between human and AI is also a matter of debate. In this work, we review the statements made by authors and third parties in the general media and academic circle about these game benchmark results and discuss factors that can impact the perception of fairness in the contest between humans and machines

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07008

PDF

http://arxiv.org/pdf/1903.07008
Read All
Model-Based Task Transfer Learning

2019-03-16

Charlott Vallon, Francesco, Borrelli

arXiv_AI

arXiv_AI Transfer_Learning
Abstract

A model-based task transfer learning (MBTTL) method is presented. We consider a constrained nonlinear dynamical system and assume that a dataset of state and input pairs that solve a task T1 is available. Our objective is to find a feasible state-feedback policy for a second task, T1, by using stored data from T2. Our approach applies to tasks T2 which are composed of the same subtasks as T1, but in different order. In this paper we formally introduce the definition of subtask, the MBTTL problem and provide examples of MBTTL in the fields of autonomous cars and manipulators. Then, a computationally efficient approach to solve the MBTTL problem is presented along with proofs of feasibility for constrained linear dynamical systems. Simulation results show the effectiveness of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07003

PDF

http://arxiv.org/pdf/1903.07003
Read All
GFD-SSD: Gated Fusion Double SSD for Multispectral Pedestrian Detection

2019-03-16

Yang Zheng, Izzat H. Izzat, Shahrzad Ziaee

arXiv_CV

arXiv_CV Object_Detection Inference Detection
Abstract

Pedestrian detection is an essential task in autonomous driving research. In addition to typical color images, thermal images benefit the detection in dark environments. Hence, it is worthwhile to explore an integrated approach to take advantage of both color and thermal images simultaneously. In this paper, we propose a novel approach to fuse color and thermal sensors using deep neural networks (DNN). Current state-of-the-art DNN object detectors vary from two-stage to one-stage mechanisms. Two-stage detectors, like Faster-RCNN, achieve higher accuracy, while one-stage detectors such as Single Shot Detector (SSD) demonstrate faster performance. To balance the trade-off, especially in the consideration of autonomous driving applications, we investigate a fusion strategy to combine two SSDs on color and thermal inputs. Traditional fusion methods stack selected features from each channel and adjust their weights. In this paper, we propose two variations of novel Gated Fusion Units (GFU), that learn the combination of feature maps generated by the two SSD middle layers. Leveraging GFUs for the entire feature pyramid structure, we propose several mixed versions of both stack fusion and gated fusion. Experiments are conducted on the KAIST multispectral pedestrian detection dataset. Our Gated Fusion Double SSD (GFD-SSD) outperforms the stacked fusion and achieves the lowest miss rate in the benchmark, at an inference speed that is two times faster than Faster-RCNN based fusion networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06999

PDF

http://arxiv.org/pdf/1903.06999
Read All
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning

2019-03-16

Peixi Xiong, Huayi Zhan, Xin Wang, Baivab Sinha, Ying Wu

arXiv_CV

arXiv_CV QA Inference VQA
Abstract

Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address the VQA problem. In contrast to prior works, our method that targets single scene VQA, replies on graph-based techniques and involves reasoning. In a nutshell, our approach is centered on three graphs. The first graph, referred to as inference graph GI , is constructed via learning over labeled data. The other two graphs, referred to as query graph Q and entity-attribute graph GEA, are generated from natural language query Qnl and image Img, that are issued from users, respectively. As GEA often does not take sufficient information to answer Q, we develop techniques to infer missing information of GEA with GI . Based on GEA and Q, we provide techniques to find matches of Q in GEA, as the answer of Qnl in Img. Unlike commonly used VQA methods that are based on end-to-end neural networks, our graph-based method shows well-designed reasoning capability, and thus is highly interpretable. We also create a dataset on soccer match (Soccer-VQA) with rich annotations. The experimental results show that our approach outperforms the state-of-the-art method and has high potential for future investigation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06994

PDF

http://arxiv.org/pdf/1903.06994
Read All
Multichannel Sparse Blind Deconvolution on the Sphere

2019-03-16

Yanjun Li, Yoram Bresler

arXiv_CV

arXiv_CV Sparse Optimization Gradient_Descent
Abstract

Multichannel blind deconvolution is the problem of recovering an unknown signal $f$ and multiple unknown channels $x_i$ from their circular convolution $y_i=x_i \circledast f$ ($i=1,2,\dots,N$). We consider the case where the $x_i$’s are sparse, and convolution with $f$ is invertible. Our nonconvex optimization formulation solves for a filter $h$ on the unit sphere that produces sparse output $y_i\circledast h$. Under some technical assumptions, we show that all local minima of the objective function correspond to the inverse filter of $f$ up to an inherent sign and shift ambiguity, and all saddle points have strictly negative curvatures. This geometric structure allows successful recovery of $f$ and $x_i$ using a simple manifold gradient descent (MGD) algorithm. Our theoretical findings are complemented by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.10437

PDF

http://arxiv.org/pdf/1805.10437
Read All
Domain adaptation for holistic skin detection

2019-03-16

Aloisio Dourado, Frederico Guth, Teofilo Emidio de Campos, Li Weigang

arXiv_CV

arXiv_CV CNN Transfer_Learning Detection
Abstract

Human skin detection in images is a widely studied topic of Computer Vision for which it is commonly accepted that analysis of pixel color or local patches may suffice. This is because skin regions appear to be relatively uniform and many argue that there is a small chromatic variation among different samples. However, we found that there are strong biases in the datasets commonly used to train or tune skin detection methods. Furthermore, the lack of contextual information may hinder the performance of local approaches. In this paper we present a comprehensive evaluation of holistic and local Convolutional Neural Network (CNN) approaches on in-domain and cross-domain experiments and compare with state-of-the-art pixel-based approaches. We also propose a combination of inductive transfer learning and unsupervised domain adaptation methods, which are evaluated on different domains under several amounts of labelled data availability. We show a clear superiority of CNN over pixel-based approaches even without labelled training samples on the target domain. Furthermore, we provide experimental support for the counter-intuitive superiority of holistic over local approaches for human skin detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06969

PDF

http://arxiv.org/pdf/1903.06969
Read All
Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network

2019-03-16

Zizhao Zhang, Lin Yang, Yefeng Zheng

arXiv_CV

arXiv_CV Adversarial Segmentation CNN
Abstract

Synthesized medical images have several important applications, e.g., as an intermedium in cross-modality image registration and as supplementary training samples to boost the generalization capability of a classifier. Especially, synthesized computed tomography (CT) data can provide X-ray attenuation map for radiation therapy planning. In this work, we propose a generic cross-modality synthesis approach with the following targets: 1) synthesizing realistic looking 3D images using unpaired training data, 2) ensuring consistent anatomical structures, which could be changed by geometric distortion in cross-modality synthesis and 3) improving volume segmentation by using synthetic data for modalities with limited training samples. We show that these goals can be achieved with an end-to-end 3D convolutional neural network (CNN) composed of mutually-beneficial generators and segmentors for image synthesis and segmentation tasks. The generators are trained with an adversarial loss, a cycle-consistency loss, and also a shape-consistency loss, which is supervised by segmentors, to reduce the geometric distortion. From the segmentation view, the segmentors are boosted by synthetic data from generators in an online manner. Generators and segmentors prompt each other alternatively in an end-to-end training fashion. With extensive experiments on a dataset including a total of 4,496 CT and magnetic resonance imaging (MRI) cardiovascular volumes, we show both tasks are beneficial to each other and coupling these two tasks results in better performance than solving them exclusively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.09655

PDF

http://arxiv.org/pdf/1802.09655
Read All
Generative Adversarial Networks: recent developments

2019-03-16

Maciej Zamorski, Adrian Zdobylak, Maciej Zięba, Jerzy Świątek

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

In traditional generative modeling, good data representation is very often a base for a good machine learning model. It can be linked to good representations encoding more explanatory factors that are hidden in the original data. With the invention of Generative Adversarial Networks (GANs), a subclass of generative models that are able to learn representations in an unsupervised and semi-supervised fashion, we are now able to adversarially learn good mappings from a simple prior distribution to a target data distribution. This paper presents an overview of recent developments in GANs with a focus on learning latent space representations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12266

PDF

http://arxiv.org/pdf/1903.12266
Read All
Imbalanced multi-label classification using multi-task learning with extractive summarization

2019-03-16

John Brandt

arXiv_CL

arXiv_CL Summarization RNN Classification
Abstract

Extractive summarization and imbalanced multi-label classification often require vast amounts of training data to avoid overfitting. In situations where training data is expensive to generate, leveraging information between tasks is an attractive approach to increasing the amount of available information. This paper employs multi-task training of an extractive summarizer and an RNN-based classifier to improve summarization and classification accuracy by 50% and 75%, respectively, relative to RNN baselines. We hypothesize that concatenating sentence encodings based on document and class context increases generalizability for highly variable corpuses.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06963

PDF

http://arxiv.org/pdf/1903.06963
Read All
Hand range of motion evaluation for Rheumatoid Arthritis patients

2019-03-16

Luciano Walenty Xavier Cejnog, Roberto Marcondes Cesar Jr., Teofilo Emidio de Campos, Valeria Meirelles Carril Elui

arXiv_CV

arXiv_CV Pose_Estimation Tracking CNN
Abstract

We introduce a framework for dynamic evaluation of the fingers movements: flexion, extension, abduction and adduction. This framework estimates angle measurements from joints computed by a hand pose estimation algorithm using a depth sensor (Realsense SR300). Given depth maps as input, our framework uses Pose-REN, which is a state-of-art hand pose estimation method that estimates 3D hand joint positions using a deep convolutional neural network. The pose estimation algorithm runs in real-time, allowing users to visualise 3D skeleton tracking results at the same time as the depth images are acquired. Once 3D joint poses are obtained, our framework estimates a plane containing the wrist and MCP joints and measures flexion/extension and abduction/aduction angles by applying computational geometry operations with respect to this plane. We analysed flexion and abduction movement patterns using real data, extracting the movement trajectories. Our preliminary results show that this method allows an automatic discrimination of hands with Rheumatoid Arthritis (RA) and healthy patients. The angle between joints can be used as an indicative of current movement capabilities and function. Although the measurements can be noisy and less accurate than those obtained statically through goniometry, the acquisition is much easier, non-invasive and patient-friendly, which shows the potential of our approach. The system can be used with and without orthosis. Our framework allows the acquisition of measurements with minimal intervention and significantly reduces the evaluation time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06949

PDF

http://arxiv.org/pdf/1903.06949
Read All
Unsupervised Part-Based Disentangling of Object Shape and Appearance

2019-03-16

Dominik Lorenz, Leonard Bereska, Timo Milbich, Björn Ommer

arXiv_CV

arXiv_CV Prediction
Abstract

Large intra-class variation is the result of changes in multiple object characteristics. Images, however, only show the superposition of different variable factors such as appearance or shape. Therefore, learning to disentangle and represent these different characteristics poses a great challenge, especially in the unsupervised case. Moreover, large object articulation calls for a flexible part-based model. We present an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category. Our model for learning an object representation is trained by simultaneously exploiting invariance and equivariance constraints between synthetically transformed images. Since no part annotation or prior information on an object class is required, the approach is applicable to arbitrary classes. We evaluate our approach on a wide range of object categories and diverse tasks including pose prediction, disentangled image synthesis, and video-to-video translation. The approach outperforms the state-of-the-art on unsupervised keypoint prediction and compares favorably even against supervised approaches on the task of shape and appearance transfer.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06946

PDF

http://arxiv.org/pdf/1903.06946
Read All
Improving Lemmatization of Non-Standard Languages with Joint Learning

2019-03-16

Enrique Manjavacas, Ákos Kádár, Mike Kestemont

arXiv_CL

arXiv_CL Language_Model
Abstract

Lemmatization of standard languages is concerned with (i) abstracting over morphological differences and (ii) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword. In the present paper we aim to improve lemmatization performance on a set of non-standard historical languages in which the difficulty is increased by an additional aspect (iii): spelling variation due to lacking orthographic standards. We approach lemmatization as a string-transduction task with an encoder-decoder architecture which we enrich with sentence context information using a hierarchical sentence encoder. We show significant improvements over the state-of-the-art when training the sentence encoder jointly for lemmatization and language modeling. Crucially, our architecture does not require POS or morphological annotations, which are not always available for historical corpora. Additionally, we also test the proposed model on a set of typologically diverse standard languages showing results on par or better than a model without enhanced sentence representations and previous state-of-the-art systems. Finally, to encourage future work on processing of non-standard varieties, we release the dataset of non-standard languages underlying the present study, based on openly accessible sources.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06939

PDF

http://arxiv.org/pdf/1903.06939
Read All
Dynamic Multi-path Neural Network

2019-03-16

Yingcheng Su, Shunfeng Zhou, Yichao Wu, Xuebo Liu, Tian Su, Ding Liang, Junjie Yan

arXiv_CV

arXiv_CV Inference Classification
Abstract

Although deeper and larger neural networks have achieved better performance, the complex network structure and increasing computational cost cannot meet the demands of many resource-constrained applications. An effective way to address this problem is to make use of dynamic inference mechanism. Existing methods usually choose to execute or skip an entire specific layer, which can only alter the depth of the network. In this paper, we propose a novel method called Dynamic Multi-path Neural Network (DMNN), which provides more path selection choices in terms of network width and depth during inference. The inference path of the network is determined by a controller, which takes into account both historical and object category information. The proposed method can be easily incorporated into most modern network architectures. Experimental results on ImageNet and CIFAR-100 demonstrate the superiority of our method on both efficiency and overall classification accuracy. To be specific, we integrate DMNN into ResNet-101 and find that our method significantly outperforms its counterparts with an encouraging 45.1% FLOPs reduction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10949

PDF

http://arxiv.org/pdf/1902.10949
Read All
Spatiotemporal Feature Learning for Event-Based Vision

2019-03-16

Rohan Ghosh, Anupam Gupta, Siyi Tang, Alcimar Soares, Nitish Thakor

arXiv_CV

arXiv_CV Tracking Represenation_Learning Recognition
Abstract

Unlike conventional frame-based sensors, event-based visual sensors output information through spikes at a high temporal resolution. By only encoding changes in pixel intensity, they showcase a low-power consuming, low-latency approach to visual information sensing. To use this information for higher sensory tasks like object recognition and tracking, an essential simplification step is the extraction and learning of features. An ideal feature descriptor must be robust to changes involving (i) local transformations and (ii) re-appearances of a local event pattern. To that end, we propose a novel spatiotemporal feature representation learning algorithm based on slow feature analysis (SFA). Using SFA, smoothly changing linear projections are learnt which are robust to local visual transformations. In order to determine if the features can learn to be invariant to various visual transformations, feature point tracking tasks are used for evaluation. Extensive experiments across two datasets demonstrate the adaptability of the spatiotemporal feature learner to translation, scaling and rotational transformations of the feature points. More importantly, we find that the obtained feature representations are able to exploit the high temporal resolution of such event-based cameras in generating better feature tracks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06923

PDF

http://arxiv.org/pdf/1903.06923
Read All
Real time backbone for semantic segmentation

2019-03-16

Zhengeng Yang, Hongshan Yu, Qiang Fu, Wei Sun, Wenyan Jia, Mingui Sun, Zhi-Hong Mao

arXiv_CV

arXiv_CV Segmentation Attention CNN Semantic_Segmentation Deep_Learning
Abstract

The rapid development of autonomous driving in recent years presents lots of challenges for scene understanding. As an essential step towards scene understanding, semantic segmentation thus received lots of attention in past few years. Although deep learning based state-of-the-arts have achieved great success in improving the segmentation accuracy, most of them suffer from an inefficiency problem and can hardly applied to practical applications. In this paper, we systematically analyze the computation cost of Convolutional Neural Network(CNN) and found that the inefficiency of CNN is mainly caused by its wide structure rather than the deep structure. In addition, the success of pruning based model compression methods proved that there are many redundant channels in CNN. Thus, we designed a very narrow while deep backbone network to improve the efficiency of semantic segmentation. By casting our network to FCN32 segmentation architecture, the basic structure of most segmentation methods, we achieved 60.6\% mIoU on Cityscape val dataset with 54 frame per seconds(FPS) on $1024\times2048$ inputs, which already outperforms one of the earliest real time deep learning based segmentation methods: ENet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06922

PDF

http://arxiv.org/pdf/1903.06922
Read All
Robust Super-Resolution GAN, with Manifold-based and Perception Loss

2019-03-16

Uddeshya Upadhyay, Suyash P. Awate

arXiv_CV

arXiv_CV Super_Resolution GAN
Abstract

Super-resolution using deep neural networks typically relies on highly curated training sets that are often unavailable in clinical deployment scenarios. Using loss functions that assume Gaussian-distributed residuals makes the learning sensitive to corruptions in clinical training sets. We propose novel loss functions that are robust to corruptions in training sets by modeling heavy-tailed non-Gaussian distributions on the residuals. We propose a loss based on an autoencoder-based manifold-distance between the super-resolved and high-resolution images, to reproduce realistic textural content in super-resolved images. We propose to learn to super-resolve images to match human perceptions of structure, luminance, and contrast. Results on a large clinical dataset shows the advantages of each of our contributions, where our framework improves over the state of the art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06920

PDF

http://arxiv.org/pdf/1903.06920
Read All
OffensEval at SemEval-2019 Task 6: Okham's Razor on Identifying and Categorizing Offensive Language in Social Media

2019-03-16

Silvia Sapora, Bogdan Lazarescu, Christo Lolov

arXiv_CL

arXiv_CL
Abstract

This document describes our approach to building an Offensive Language Classifier. More specifically, the OffensEval 2019 competition required us to build three classifiers with slightly different goals: - Offensive language identification: would classify a tweet as offensive or not. - Automatic categorization of offense types: would recognize if the target of the offense was an individual or not. - Offense target identification: would identify the target of the offense between an individual, group or other. In this report, we will discuss the different architectures, algorithms and pre-processing strategies we tried, together with a detailed description of the designs of our final classifiers and the reasons we choose them over others. We evaluated our classifiers on the official test set provided for the OffenseEval 2019 competition, obtaining a macro-averaged F1-score of 0.7189 for Task A, 0.6708 on Task B and 0.5442 on Task C.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.05929

PDF

https://arxiv.org/pdf/1903.05929
Read All
A Cross-Season Correspondence Dataset for Robust Semantic Segmentation

2019-03-16

Måns Larsson, Erik Stenborg, Lars Hammarstrand, Torsten Sattler, Mark Pollefeys, Fredrik Kahl

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation
Abstract

In this paper, we present a method to utilize 2D-2D point matches between images taken during different image conditions to train a convolutional neural network for semantic segmentation. Enforcing label consistency across the matches makes the final segmentation algorithm robust to seasonal changes. We describe how these 2D-2D matches can be generated with little human interaction by geometrically matching points from 3D models built from images. Two cross-season correspondence datasets are created providing 2D-2D matches across seasonal changes as well as from day to night. The datasets are made publicly available to facilitate further research. We show that adding the correspondences as extra supervision during training improves the segmentation performance of the convolutional neural network, making it more robust to seasonal changes and weather conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06916

PDF

http://arxiv.org/pdf/1903.06916
Read All
Combination of multiple Deep Learning architectures for Offensive Language Detection in Tweets

2019-03-16

Nicolò Frisiani, Alexis Laignelet, Batuhan Güler

arXiv_CL

arXiv_CL Deep_Learning Detection
Abstract

This report contains the details regarding our submission to the OffensEval 2019 (SemEval2019 - Task 6). We first discuss the details of the classifier implemented and the type of input data used and preprocessing performed. We then move onto critically evaluating our performance. Indeed we have achieved a macro-average F1-score of 0.76, 0.68, and 0.54, respectively for Task a, Task b, and Task c, which we believe reflects on the level of sophistication of the models implemented. Finally, we will be discussing the difficulties encountered and possible improvements for the future. Our code can be found at https://goo.gl/mdtuwF

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08734

PDF

http://arxiv.org/pdf/1903.08734
Read All
Classification of dry age-related macular degeneration and diabetic macular edema from optical coherence tomography images using dictionary learning

2019-03-16

Elahe Mousavi, Rahele Kafieh, Hossein Rabbani

arXiv_CV

arXiv_CV Segmentation Classification
Abstract

Age-related Macular Degeneration (AMD) and Diabetic Macular Edema (DME) are the major causes of vision loss in developed countries. Alteration of retinal layer structure and appearance of exudate are the most significant signs of these diseases. With the aim of automatic classification of DME, AMD and normal subjects from Optical Coherence Tomography (OCT) images, we proposed a classification algorithm. The two important issues intended in this approach are, not utilizing retinal layer segmentation which by itself is a challenging task and attempting to identify diseases in their early stages, where the signs of diseases appear in a small fraction of B-Scans. We used a histogram of oriented gradients (HOG) feature descriptor to well characterize the distribution of local intensity gradients and edge directions. In order to capture the structure of extracted features, we employed different dictionary learning-based classifiers. Our dataset consists of 45 subjects: 15 patients with AMD, 15 patients with DME and 15 normal subjects. The proposed classifier leads to an accuracy of 95.13%, 100.00%, and 100.00% for DME, AMD, and normal OCT images, respectively, only by considering the 4% of all B-Scans of a volume which outperforms the state of the art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06909

PDF

http://arxiv.org/pdf/1903.06909
Read All
Non-intrusive speech quality assessment using neural networks

2019-03-16

Anderson R. Avila, Hannes Gamper, Chandan Reddy, Ross Cutler, Ivan Tashev, Johannes Gehrke

arXiv_SD

arXiv_SD Relation Recommendation
Abstract

Estimating the perceived quality of an audio signal is critical for many multimedia and audio processing systems. Providers strive to offer optimal and reliable services in order to increase the user quality of experience (QoE). In this work, we present an investigation of the applicability of neural networks for non-intrusive audio quality assessment. We propose three neural network-based approaches for mean opinion score (MOS) estimation. We compare our results to three instrumental measures: the perceptual evaluation of speech quality (PESQ), the ITU-T Recommendation P.563, and the speech-to-reverberation energy ratio. Our evaluation uses a speech dataset contaminated with convolutive and additive noise, labeled using a crowd-based QoE evaluation, evaluated with Pearson correlation with MOS labels, and mean-squared-error of the estimated MOS. Our proposed approaches outperform the aforementioned instrumental measures, with a fully connected deep neural network using Mel-frequency features providing the best correlation (0.87) and the lowest mean squared error (0.15)

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06908

PDF

http://arxiv.org/pdf/1903.06908
Read All
HexaShrink, an exact scalable framework for hexahedral meshes with attributes and discontinuities: multiresolution rendering and storage of geoscience models

2019-03-16

Jean-Luc Peyrot, Laurent Duval, Frédéric Payan, Lauriane Bouard, Lénaïc Chizat, Sébastien Schneider, Marc Antonini

arXiv_CV

arXiv_CV
Abstract

With huge data acquisition progresses realized in the past decades and acquisition systems now able to produce high resolution grids and point clouds, the digitization of physical terrains becomes increasingly more precise. Such extreme quantities of generated and modeled data greatly impact computational performances on many levels of high-performance computing (HPC): storage media, memory requirements, transfer capability, and finally simulation interactivity, necessary to exploit this instance of big data. Efficient representations and storage are thus becoming “enabling technologies’’ in HPC experimental and simulation science. We propose HexaShrink, an original decomposition scheme for structured hexahedral volume meshes. The latter are used for instance in biomedical engineering, materials science, or geosciences. HexaShrink provides a comprehensive framework allowing efficient mesh visualization and storage. Its exactly reversible multiresolution decomposition yields a hierarchy of meshes of increasing levels of details, in terms of either geometry, continuous or categorical properties of cells. Starting with an overview of volume meshes compression techniques, our contribution blends coherently different multiresolution wavelet schemes in different dimensions. It results in a global framework preserving discontinuities (faults) across scales, implemented as a fully reversible upscaling at different resolutions. Experimental results are provided on meshes of varying size and complexity. They emphasize the consistency of the proposed representation, in terms of visualization, attribute downsampling and distribution at different resolutions. Finally, HexaShrink yields gains in storage space when combined to lossless compression techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07614

PDF

http://arxiv.org/pdf/1903.07614
Read All
Emotion Action Detection and Emotion Inference: the Task and Dataset

2019-03-16

Pengyuan Liu, Chengyu Du, Shuofeng Zhao, Chenghao Zhu

arXiv_CL

arXiv_CL Inference Classification Detection
Abstract

Many Natural Language Processing works on emotion analysis only focus on simple emotion classification without exploring the potentials of putting emotion into “event context”, and ignore the analysis of emotion-related events. One main reason is the lack of this kind of corpus. Here we present Cause-Emotion-Action Corpus, which manually annotates not only emotion, but also cause events and action events. We propose two new tasks based on the data-set: emotion causality and emotion inference. The first task is to extract a triple (cause, emotion, action). The second task is to infer the probable emotion. We are currently releasing the data-set with 10,603 samples and 15,892 events, basic statistic analysis and baseline on both emotion causality and emotion inference tasks. Baseline performance demonstrates that there is much room for both tasks to be improved.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06901

PDF

http://arxiv.org/pdf/1903.06901
Read All
Concatenated Feature Pyramid Network for Instance Segmentation

2019-03-16

Yongqing Sun, Pranav Shenoy K P, Jun Shimamura, Atsushi Sagata

arXiv_CV

arXiv_CV Object_Detection Segmentation Detection Relation
Abstract

Low level features like edges and textures play an important role in accurately localizing instances in neural networks. In this paper, we propose an architecture which improves feature pyramid networks commonly used instance segmentation networks by incorporating low level features in all layers of the pyramid in an optimal and efficient way. Specifically, we introduce a new layer which learns new correlations from feature maps of multiple feature pyramid levels holistically and enhances the semantic information of the feature pyramid to improve accuracy. Our architecture is simple to implement in instance segmentation or object detection frameworks to boost accuracy. Using this method in Mask RCNN, our model achieves consistent improvement in precision on COCO Dataset with the computational overhead compared to the original feature pyramid network.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00768

PDF

http://arxiv.org/pdf/1904.00768
Read All
Ontology Based Global and Collective Motion Patterns for Event Classification in Basketball Videos

2019-03-16

Lifang Wu, Zhou Yang, Jiaoyu He, Meng Jian, Yaowen Xu, Dezhong Xu, Chang Wen Chen

arXiv_CV

arXiv_CV Ontology CNN RNN Classification
Abstract

In multi-person videos, especially team sport videos, a semantic event is usually represented as a confrontation between two teams of players, which can be represented as collective motion. In broadcast basketball videos, specific camera motions are used to present specific events. Therefore, a semantic event in broadcast basketball videos is closely related to both the global motion (camera motion) and the collective motion. A semantic event in basketball videos can be generally divided into three stages: pre-event, event occurrence (event-occ), and post-event. In this paper, we propose an ontology-based global and collective motion pattern (On_GCMP) algorithm for basketball event classification. First, a two-stage GCMP based event classification scheme is proposed. The GCMP is extracted using optical flow. The two-stage scheme progressively combines a five-class event classification algorithm on event-occs and a two-class event classification algorithm on pre-events. Both algorithms utilize sequential convolutional neural networks (CNNs) and long short-term memory (LSTM) networks to extract the spatial and temporal features of GCMP for event classification. Second, we utilize post-event segments to predict success/failure using deep features of images in the video frames (RGB_DF_VF) based algorithms. Finally the event classification results and success/failure classification results are integrated to obtain the final results. To evaluate the proposed scheme, we collected a new dataset called NCAA+, which is automatically obtained from the NCAA dataset by extending the fixed length of video clips forward and backward of the corresponding semantic events. The experimental results demonstrate that the proposed scheme achieves the mean average precision of 59.22% on NCAA+. It is higher by 7.62% than state-of-the-art on NCAA.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06879

PDF

http://arxiv.org/pdf/1903.06879
Read All
Fast Interactive Object Annotation with Curve-GCN

2019-03-16

Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler

arXiv_CV

arXiv_CV CNN RNN
Abstract

Manually labeling objects by tracing their boundaries is a laborious process. In Polygon-RNN++ the authors proposed Polygon-RNN that produces polygonal annotations in a recurrent manner using a CNN-RNN architecture, allowing interactive correction via humans-in-the-loop. We propose a new framework that alleviates the sequential nature of Polygon-RNN, by predicting all vertices simultaneously using a Graph Convolutional Network (GCN). Our model is trained end-to-end. It supports object annotation by either polygons or splines, facilitating labeling efficiency for both line-based and curved objects. We show that Curve-GCN outperforms all existing approaches in automatic mode, including the powerful PSP-DeepLab and is significantly more efficient in interactive mode than Polygon-RNN++. Our model runs at 29.3ms in automatic, and 2.6ms in interactive mode, making it 10x and 100x faster than Polygon-RNN++.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06874

PDF

http://arxiv.org/pdf/1903.06874
Read All
Clustering of Driving Encounter Scenarios Using Connected Vehicle Trajectories

2019-03-16

Wenshuo Wang, Aditya Ramesh, Ding Zhao

arXiv_RO

arXiv_RO Knowledge Classification
Abstract

Multi-vehicle interaction behavior classification and analysis offer in-depth knowledge to make an efficient decision for autonomous vehicles. This paper aims to cluster a wide range of driving encounter scenarios based only on multi-vehicle GPS trajectories. Towards this end, we propose a generic unsupervised learning framework comprising two layers: feature representation layer and clustering layer. In the layer of feature representation, we combine the deep autoencoders with a distance-based measure to map the sequential observations of driving encounters into a computationally tractable space that allows quantifying the spatiotemporal interaction characteristics of two vehicles. The clustering algorithm is then applied to the extracted representations to gather homogeneous driving encounters into groups. Our proposed generic framework is then evaluated using 2,568 naturalistic driving encounters. Experimental results demonstrate that our proposed generic framework incorporated with unsupervised learning can cluster multi-trajectory data into distinct groups. These clustering results could benefit decision-making policy analysis and design for autonomous vehicles.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.08415

PDF

http://arxiv.org/pdf/1807.08415
Read All
Domain Generalization by Solving Jigsaw Puzzles

2019-03-16

Fabio Maria Carlucci, Antonio D'Innocente, Silvia Bucci, Barbara Caputo, Tatiana Tommasi

arXiv_CV

arXiv_CV Knowledge Classification Relation Recognition
Abstract

Human adaptability relies crucially on the ability to learn and merge knowledge both from supervised and unsupervised learning: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images. This secondary task helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task. Multiple experiments on the PACS, VLCS, Office-Home and digits datasets confirm our intuition and show that this simple method outperforms previous domain generalization and adaptation solutions. An ablation study further illustrates the inner workings of our approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06864

PDF

http://arxiv.org/pdf/1903.06864
Read All
Learning Super-resolution 3D Segmentation of Plant Root MRI Images from Few Examples

2019-03-16

Ali Oguz Uzman, Jannis Horn, Sven Behnke

arXiv_CV

arXiv_CV Super_Resolution Segmentation
Abstract

Analyzing plant roots is crucial to understand plant performance in different soil environments. While magnetic resonance imaging (MRI) can be used to obtain 3D images of plant roots, extracting the root structural model is challenging due to highly noisy soil environments and low-resolution of MRI images. To improve both contrast and resolution, we adapt the state-of-the-art method RefineNet for 3D segmentation of the plant root MRI images in super-resolution. The networks are trained from few manual segmentations that are augmented by geometric transformations, realistic noise, and other variabilities. The resulting segmentations contain most root structures, including branches not extracted by the human annotator.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06855

PDF

http://arxiv.org/pdf/1903.06855
Read All
Safe Coordination of Human-Robot Firefighting Teams

2019-03-16

Esmaeil Seraj, Andrew Silva, Matthew Gombolay

arXiv_AI

arXiv_AI Tracking
Abstract

Wildfires are destructive and inflict massive, irreversible harm to victims’ lives and natural resources. Researchers have proposed commissioning unmanned aerial vehicles (UAVs) to provide firefighters with real-time tracking information; yet, these UAVs are not able to reason about a fire’s track, including current location, measurement, and uncertainty, as well as propagation. We propose a model-predictive, probabilistically safe distributed control algorithm for human-robot collaboration in wildfire fighting. The proposed algorithm overcomes the limitations of prior work by explicitly estimating the latent fire propagation dynamics to enable intelligent, time-extended coordination of the UAVs in support of on-the-ground human firefighters. We derive a novel, analytical bound that enables UAVs to distribute their resources and provides a probabilistic guarantee of the humans’ safety while preserving the UAVs’ ability to cover an entire fire.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06847

PDF

http://arxiv.org/pdf/1903.06847
Read All

118/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL