Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Solo or Ensemble? Choosing a CNN Architecture for Melanoma Classification

2019-04-29

Fábio Perez, Sandra Avila, Eduardo Valle

arXiv_CV

arXiv_CV CNN Transfer_Learning Classification Relation
Abstract

Convolutional neural networks (CNNs) deliver exceptional results for computer vision, including medical image analysis. With the growing number of available architectures, picking one over another is far from obvious. Existing art suggests that, when performing transfer learning, the performance of CNN architectures on ImageNet correlates strongly with their performance on target tasks. We evaluate that claim for melanoma classification, over 9 CNNs architectures, in 5 sets of splits created on the ISIC Challenge 2017 dataset, and 3 repeated measures, resulting in 135 models. The correlations we found were, to begin with, much smaller than those reported by existing art, and disappeared altogether when we considered only the top-performing networks: uncontrolled nuisances (i.e., splits and randomness) overcome any of the analyzed factors. Whenever possible, the best approach for melanoma classification is still to create ensembles of multiple models. We compared two choices for selecting which models to ensemble: picking them at random (among a pool of high-quality ones) vs. using the validation set to determine which ones to pick first. For small ensembles, we found a slight advantage on the second approach but found that random choice was also competitive. Although our aim in this paper was not to maximize performance, we easily reached AUCs comparable to the first place on the ISIC Challenge 2017.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12724

PDF

http://arxiv.org/pdf/1904.12724
Read All
A New Method for Atlanta World Frame Estimation

2019-04-29

Yinlong Liu, Alois Knoll, Guang Chen

arXiv_CV

arXiv_CV Relation
Abstract

In this paper, we propose a new Atlanta frame estimation method by considering the relationship between vertical direction and horizontal directions. Unlike previous solutions, our method does not solve all the directions at one time. On the contrary, it estimates the directions sequentially. Concretely, our method first searches the vertical direction in $\mathbb{S}^2$ globally, then estimates the horizontal directions in one-dimension. As a consequence, the dimensionality of each subproblem problem is low and it can be solved efficiently. In other words, the running time of our method will not greatly increase as the number of horizontal directions increases. The advantages of our method are validated via testing on both synthetic and real-world data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12717

PDF

http://arxiv.org/pdf/1904.12717
Read All
DeLiO: Decoupled LiDAR Odometry

2019-04-29

Queens Maria Thomas, Oliver Wasenmüller, Didier Stricker

arXiv_CV

arXiv_CV Face Tracking
Abstract

Most LiDAR odometry algorithms estimate the transformation between two consecutive frames by estimating the rotation and translation in an intervening fashion. In this paper, we propose our Decoupled LiDAR Odometry (DeLiO), which – for the first time – decouples the rotation estimation completely from the translation estimation. In particular, the rotation is estimated by extracting the surface normals from the input point clouds and tracking their characteristic pattern on a unit sphere. Using this rotation the point clouds are unrotated so that the underlying transformation is pure translation, which can be easily estimated using a line cloud approach. An evaluation is performed on the KITTI dataset and the results are compared against state-of-the-art algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12667

PDF

http://arxiv.org/pdf/1904.12667
Read All
A dual branch deep neural network for classification and detection in mammograms

2019-04-29

Ran Bakalo, Jacob Goldberger, Rami Ben-Ari

arXiv_CV

arXiv_CV Weakly_Supervised Classification Deep_Learning Detection
Abstract

In this paper, we propose a novel deep learning architecture for joint classification and localization of abnormalities in mammograms. We first assume a weakly supervised setting and present a new approach with data driven decisions. This novel network combines two learning branches with region-level classification and region ranking. The network provides a global classification of the image into multiple classes, such as malignant, benign or normal. Our method further enables the localization of abnormalities as global class discriminative regions in full mammogram resolution. Next, we extend this method to a semi-supervised setting that engages a small set of local annotations, using a novel architecture, and a multi-task objective function. We present the impact of the local annotations on several performance measures, including localization, to evaluate the cost effectiveness of lesion annotation effort. Our evaluation is made over a large multi-center mammography dataset of $\sim$3,000 mammograms with various findings. Experimental results demonstrate the capabilities and advantages of the proposed method over previous weakly-supervised strategies, and the impact of semi-supervised learning. We show that targeting the annotation of only 5% of the images can significantly boost performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12589

PDF

http://arxiv.org/pdf/1904.12589
Read All
Affective EEG-Based Person Identification Using the Deep Learning Approach

2019-04-29

Theerawit Wilaiprasitporn, Apiwat Ditthapron, Karis Matchaparn, Tanaboon Tongbuasirilai, Nannapas Banluesombatkul, Ekapol Chuangsuwanich

arXiv_CV

arXiv_CV CNN RNN Deep_Learning Recognition
Abstract

Electroencephalography (EEG) is another mode for performing Person Identification (PI). Due to the nature of the EEG signals, EEG-based PI is typically done while the person is performing some kind of mental task, such as motor control. However, few works have considered EEG-based PI while the person is in different mental states (affective EEG). The aim of this paper is to improve the performance of affective EEG-based PI using a deep learning approach. \textcolor{red}{We proposed a cascade of deep learning using a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)}. CNNs are used to handle the spatial information from the EEG while RNNs extract the temporal information. \textcolor{red}{We evaluated two types of RNNs, namely, Long Short-Term Memory (CNN-LSTM) and Gated Recurrent Unit (CNN-GRU). } The proposed method is evaluated on the state-of-the-art affective dataset DEAP. The results indicate that CNN-GRU and CNN-LSTM can perform PI from different affective states and reach up to 99.90–100\% mean Correct Recognition Rate (CRR), significantly outperforming a support vector machine (SVM) baseline system that uses power spectral density (PSD) features. Notably, the 100\% mean \emph{CRR} comes from only 40 subjects in DEAP dataset. To reduce the number of EEG electrodes from thirty-two to five for more practical applications, the frontal region gives the best results reaching up to 99.17\% CRR (from CNN-GRU). Amongst the two deep learning models, we find CNN-GRU to slightly outperform CNN-LSTM, while having faster training time. \textcolor{red}{Furthermore, CNN-GRU overcomes the influence of affective states in EEG-Based PI reported in the previous works.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.03147

PDF

http://arxiv.org/pdf/1807.03147
Read All
Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications

2019-04-29

Mark-Christoph Müller

arXiv_CL

arXiv_CL Classification
Abstract

We present a very simple, unsupervised method for the pairwise matching of documents from heterogeneous collections. We demonstrate our method with the Concept-Project matching task, which is a binary classification task involving pairs of documents from heterogeneous collections. Although our method only employs standard resources without any domain- or task-specific modifications, it clearly outperforms the more complex system of the original authors. In addition, our method is transparent, because it provides explicit information about how a similarity score was computed, and efficient, because it is based on the aggregation of (pre-computable) word-level similarities.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12550

PDF

http://arxiv.org/pdf/1904.12550
Read All
ConvTimeNet: A Pre-trained Deep Convolutional Neural Network for Time Series Classification

2019-04-29

Kathan Kashiparekh, Jyoti Narwariya, Pankaj Malhotra, Lovekesh Vig, Gautam Shroff

arXiv_AI

arXiv_AI CNN Classification
Abstract

Training deep neural networks often requires careful hyper-parameter tuning and significant computational resources. In this paper, we propose ConvTimeNet (CTN): an off-the-shelf deep convolutional neural network (CNN) trained on diverse univariate time series classification (TSC) source tasks. Once trained, CTN can be easily adapted to new TSC target tasks via a small amount of fine-tuning using labeled instances from the target tasks. We note that the length of convolutional filters is a key aspect when building a pre-trained model that can generalize to time series of different lengths across datasets. To achieve this, we incorporate filters of multiple lengths in all convolutional layers of CTN to capture temporal features at multiple time scales. We consider all 65 datasets with time series of lengths up to 512 points from the UCR TSC Benchmark for training and testing transferability of CTN: We train CTN on a randomly chosen subset of 24 datasets using a multi-head approach with a different softmax layer for each training dataset, and study generalizability and transferability of the learned filters on the remaining 41 TSC datasets. We observe significant gains in classification accuracy as well as computational efficiency when using pre-trained CTN as a starting point for subsequent task-specific fine-tuning compared to existing state-of-the-art TSC approaches. We also provide qualitative insights into the working of CTN by: i) analyzing the activations and filters of first convolution layer suggesting the filters in CTN are generically useful, ii) analyzing the impact of the design decision to incorporate multiple length decisions, and iii) finding regions of time series that affect the final classification decision via occlusion sensitivity analysis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12546

PDF

http://arxiv.org/pdf/1904.12546
Read All
Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction

2019-04-29

Mingming Sun, Xu Li, Xin Wang, Miao Fan, Yue Feng, Ping Li

arXiv_AI

arXiv_AI Knowledge Relation_Extraction Attention Optimization Relation
Abstract

In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, Description, and Concept), and propose a unified knowledge expression form, SAOKE, to express them. We publicly release a data set which contains more than forty thousand sentences and the corresponding facts in the SAOKE format labeled by crowd-sourcing. To our knowledge, this is the largest publicly available human labeled data set for open information extraction tasks. Using this labeled SAOKE data set, we train an end-to-end neural model using the sequenceto-sequence paradigm, called Logician, to transform sentences into facts. For each sentence, different to existing algorithms which generally focus on extracting each single fact without concerning other possible facts, Logician performs a global optimization over all possible involved facts, in which facts not only compete with each other to attract the attention of words, but also cooperate to share words. An experimental study on various types of open domain relation extraction tasks reveals the consistent superiority of Logician to other states-of-the-art algorithms. The experiments verify the reasonableness of SAOKE format, the valuableness of SAOKE data set, the effectiveness of the proposed Logician model, and the feasibility of the methodology to apply end-to-end learning paradigm on supervised data sets for the challenging tasks of open information extraction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12535

PDF

http://arxiv.org/pdf/1904.12535
Read All
Casting Geometric Constraints in Semantic Segmentation as Semi-Supervised Learning

2019-04-29

Sinisa Stekovic, Friedrich Fraundorfer, Vincent Lepetit

arXiv_AI

arXiv_AI Segmentation Semantic_Segmentation
Abstract

We propose a simple yet effective method to learn to segment new indoor scenes from an RGB-D sequence: State-of-the-art methods trained on one dataset, even as large as SUNRGB-D dataset, can perform poorly when applied to images that are not part of the dataset, because of the dataset bias, a common phenomenon in computer vision. To make semantic segmentation more useful in practice, we learn to segment new indoor scenes from sequences without manual annotations by exploiting geometric constraints and readily available training data from SUNRGB-D. As a result, we can then robustly segment new images of these scenes from color information only. To efficiently exploit geometric constraints for our purpose, we propose to cast these constraints as semi-supervised terms, which enforce the fact that the same class should be predicted for the projections of the same 3D location in different images. We show that this approach results in a simple yet very powerful method, which can annotate sequences of ScanNet and our own sequences using only annotations from SUNRGB-D.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12534

PDF

http://arxiv.org/pdf/1904.12534
Read All
A Complete Classification of the Complexity and Rewritability of Ontology-Mediated Queries based on the Description Logic EL

2019-04-29

Carsten Lutz, Leif Sabellek

arXiv_AI

arXiv_AI Ontology Classification
Abstract

We provide an ultimately fine-grained analysis of the data complexity and rewritability of ontology-mediated queries (OMQs) based on an EL ontology and a conjunctive query (CQ). Our main results are that every such OMQ is in AC0, NL-complete, or PTime-complete and that containment in NL coincides with rewritability into linear Datalog (whereas containment in AC0 coincides with rewritability into first-order logic). We establish natural characterizations of the three cases in terms of bounded depth and (un)bounded pathwidth, and show that every of the associated meta problems such as deciding wether a given OMQ is rewritable into linear Datalog is ExpTime-complete. We also give a way to construct linear Datalog rewritings when they exist and prove that there is no constant Datalog rewritings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12533

PDF

http://arxiv.org/pdf/1904.12533
Read All
Effect of the nanowire diameter on the linearity of the response of GaN-based heterostructured nanowire photodetectors

2019-04-29

Maria Spies, Jakub Polaczyński, Akhil Ajay, Dipankar Kalita, Jonas Lähnemann, Bruno Gayral, Martien I. den Hertog, Eva Monroy

arXiv_CV

arXiv_CV Object_Detection GAN Detection
Abstract

Nanowire photodetectors are investigated because of their compatibility with flexible electronics, or for the implementation of on-chip optical interconnects. Such devices are characterized by ultrahigh photocurrent gain, but their photoresponse scales sublinearly with the optical power. Here, we present a study of single-nanowire photodetectors displaying a linear response to ultraviolet illumination. Their structure consists of a GaN nanowire incorporating an AlN/GaN/AlN heterostructure, which generates an internal electric field. The activity of the heterostructure is confirmed by the rectifying behavior of the current-voltage characteristics in the dark, as well as by the asymmetry of the photoresponse in magnitude and linearity. Under reverse bias (negative bias on the GaN cap segment), the detectors behave linearly with the impinging optical power when the nanowire diameter is below a certain threshold ($\approx$ 80 nm), which corresponds to the total depletion of the nanowire stem due to the Fermi level pinning at the sidewalls. In the case of nanowires that are only partially depleted, their nonlinearity is explained by a nonlinear variation of the diameter of their central conducting channel under illumination.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.12515

PDF

https://arxiv.org/pdf/1904.12515
Read All
Meta Anti-spoofing: Learning to Learn in Face Anti-spoofing

2019-04-29

Chenxu Zhao, Yunxiao Qin, Zezheng Wang, Tianyu Fu, Hailin Shi

arXiv_CV

arXiv_CV Object_Detection Face Detection Recognition Face_Recognition
Abstract

Face anti-spoofing is crucial to the security of face recognition systems. Previously, most methods formulate face anti-spoofing as a supervised learning problem to detect various predefined presentation attacks (PA). However, new attack methods keep evolving that produce new forms of spoofing faces to compromise the existing detectors. This requires researchers to collect a large number of samples to train classifiers for detecting new attacks, which is often costly and leads the later newly evolved attack samples to remain in small scales. Alternatively, we define face anti-spoofing as a few-shot learning problem with evolving new attacks and propose a novel face anti-spoofing approach via meta-learning named Meta Face Anti-spoofing (Meta-FAS). Meta-FAS addresses the above-mentioned problems by training the classifiers how to learn to detect the spoofing faces with few examples. To assess the effectiveness of the proposed approach, we propose a series of evaluation benchmarks based on public datasets (\textit{e.g.}, OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, MSU-MFSD, 3D-MAD, and CASIA-SURF), and the proposed approach shows its superior performances to compared methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12490

PDF

http://arxiv.org/pdf/1904.12490
Read All
Self-Attention Capsule Networks for Image Classification

2019-04-29

Assaf Hoogi, Brian Wilcox, Yachee Gupta, Daniel L. Rubin

arXiv_CV

arXiv_CV Salient Attention CNN Image_Classification Classification Relation
Abstract

We propose a novel architecture for image classification, called Self-Attention Capsule Networks (SACN). SACN is the first model that incorporates the Self-Attention mechanism as an integral layer within the Capsule Network (CapsNet). While the Self-Attention mechanism selects the more dominant image regions to focus on, the CapsNet analyzes the relevant features and their spatial correlations inside these regions only. The features are extracted in the convolutional layer. Then, the Self-Attention layer learns to suppress irrelevant regions based on features analysis, and highlights salient features useful for a specific task. The attention map is then fed into the CapsNet primary layer that is followed by a classification layer. The SACN proposed model was designed to use a relatively shallow CapsNet architecture to reduce computational load, and compensates for the absence of a deeper network by using the Self-Attention module to significantly improve the results. The proposed Self-Attention CapsNet architecture was extensively evaluated on five different datasets, mainly on three different medical sets, in addition to the natural MNIST and SVHN. The model was able to classify images and their patches with diverse and complex backgrounds better than the baseline CapsNet. As a result, the proposed Self-Attention CapsNet significantly improved classification performance within and across different datasets and outperformed the baseline CapsNet not only in classification accuracy but also in robustness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12483

PDF

http://arxiv.org/pdf/1904.12483
Read All
Teaching AI, Ethics, Law and Policy

2019-04-29

Asher Wilk

arXiv_AI

arXiv_AI
Abstract

The cyberspace and the development of new technologies, especially intelligent systems using artificial intelligence, present enormous challenges to computer professionals, data scientists, managers and policy makers. There is a need to address professional responsibility, ethical, legal, societal, and policy issues. This paper presents problems and issues relevant to computer professionals and decision makers and suggests a curriculum for a course on ethics, law and policy. Such a course will create awareness of the ethics issues involved in building and using software and artificial intelligence.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12470

PDF

http://arxiv.org/pdf/1904.12470
Read All
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

2019-04-29

Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer

arXiv_CV

arXiv_CV Inference
Abstract

Model size and inference speed/power have become a major challenge in the deployment of Neural Networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow lower precision as compared to other layers. However, there is no systematic way to determine the precision of different layers. A brute force approach is not feasible for deep networks, as the search space for mixed-precision is exponential in the number of layers. Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision. Here, we introduce Hessian AWare Quantization (HAWQ), a novel second-order quantization method to address these problems. HAWQ allows for the automatic selection of the relative quantization precision of each layer, based on the layer’s Hessian spectrum. Moreover, HAWQ provides a deterministic fine-tuning order for quantizing layers, based on second-order information. We show the results of our method on Cifar-10 using ResNet20, and on ImageNet using Inception-V3, ResNet50 and SqueezeNext models. Comparing HAWQ with state-of-the-art shows that we can achieve similar/better accuracy with $8\times$ activation compression ratio on ResNet20, as compared to DNAS~\cite{wu2018mixed}, and up to $1\%$ higher accuracy with up to $14\%$ smaller models on ResNet50 and Inception-V3, compared to recently proposed methods of RVQuant~\cite{park2018value} and HAQ~\cite{wang2018haq}. Furthermore, we show that we can quantize SqueezeNext to just 1MB model size while achieving above $68\%$ top1 accuracy on ImageNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03696

PDF

http://arxiv.org/pdf/1905.03696
Read All
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers

2019-04-29

Dongxiang Zhang, Lei Wang, Luming Zhang, Bing Tian Dai, Heng Tao Shen

arXiv_CL

arXiv_CL Attention Survey
Abstract

Solving mathematical word problems (MWPs) automatically is challenging, primarily due to the semantic gap between human-readable words and machine-understandable logics. Despite the long history dated back to the1960s, MWPs have regained intensive attention in the past few years with the advancement of Artificial Intelligence (AI). Solving MWPs successfully is considered as a milestone towards general AI. Many systems have claimed promising results in self-crafted and small-scale datasets. However, when applied on large and diverse datasets, none of the proposed methods in the literature achieves high precision, revealing that current MWP solvers still have much room for improvement. This motivated us to present a comprehensive survey to deliver a clear and complete picture of automatic math problem solvers. In this survey, we emphasize on algebraic word problems, summarize their extracted features and proposed techniques to bridge the semantic gap and compare their performance in the publicly accessible datasets. We also cover automatic solvers for other types of math problems such as geometric problems that require the understanding of diagrams. Finally, we identify several emerging research directions for the readers with interests in MWPs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.07290

PDF

http://arxiv.org/pdf/1808.07290
Read All
Person Re-identification Using Visual Attention

2019-04-29

Alireza Rahimpour, Liu Liu, Ali Taalimi, Yang Song, Hairong Qi

arXiv_CV

arXiv_CV Re-identification Attention Person_Re-identification CNN
Abstract

Despite recent attempts for solving the person re-identification problem, it remains a challenging task since a person’s appearance can vary significantly when large variations in view angle, human pose, and illumination are involved. In this paper, we propose a novel approach based on using a gradient-based attention mechanism in deep convolution neural network for solving the person re-identification problem. Our model learns to focus selectively on parts of the input image for which the networks’ output is most sensitive to and processes them with high resolution while perceiving the surrounding image in low resolution. Extensive comparative evaluations demonstrate that the proposed method outperforms state-of-the-art approaches on the challenging CUHK01, CUHK03, and Market 1501 datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1707.07336

PDF

http://arxiv.org/pdf/1707.07336
Read All
Feature Encoding in Band-limited Distributed Surveillance Systems

2019-04-29

Alireza Rahimpour, Ali Taalimi, Hairong Qi

arXiv_CV

arXiv_CV Recognition
Abstract

Distributed surveillance systems have become popular in recent years due to security concerns. However, transmitting high dimensional data in bandwidth-limited distributed systems becomes a major challenge. In this paper, we address this issue by proposing a novel probabilistic algorithm based on the divergence between the probability distributions of the visual features in order to reduce their dimensionality and thus save the network bandwidth in distributed wireless smart camera networks. We demonstrate the effectiveness of the proposed approach through extensive experiments on two surveillance recognition tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1612.06423

PDF

http://arxiv.org/pdf/1612.06423
Read All
Challenges and Pitfalls of Reproducing Machine Learning Artifacts

2019-04-29

Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

arXiv_AI

arXiv_AI
Abstract

An increasingly complex and diverse collection of Machine Learning(ML) models as well as hardware/software stacks, collectively referred to as “ML artifacts”, are being proposed - leading to a diverse landscape of ML. These ML innovations proposed have outpaced researchers’ ability to analyze, study and adapt them. This is exacerbated by the complicated and sometimes non-reproducible procedures for ML evaluation. The current practice of sharing ML artifacts is through repositories where artifact authors post ad-hoc code and some documentation. The authors often fail to reveal critical information for others to reproduce their results. One often fails to reproduce artifact authors’ claims, not to mention adapt the model to his/her own use. This article discusses the common challenges and pitfalls of reproducing ML artifacts, which can be used as a guideline for ML researchers when sharing or reproducing artifacts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12437

PDF

http://arxiv.org/pdf/1904.12437
Read All
HOG feature extraction from encrypted images for privacy-preserving machine learning

2019-04-29

Masaki Kitayama, Hitoshi Kiya

arXiv_CV

arXiv_CV Object_Detection Face Image_Classification Classification Detection Recognition
Abstract

In this paper, we propose an extraction method of HOG (histograms-of-oriented-gradients) features from encryption-then-compression (EtC) images for privacy-preserving machine learning, where EtC images are images encrypted by a block-based encryption method proposed for EtC systems with JPEG compression, and HOG is a feature descriptor used in computer vision for the purpose of object detection and image classification. Recently, cloud computing and machine learning have been spreading in many fields. However, the cloud computing has serious privacy issues for end users, due to unreliability of providers and some accidents. Accordingly, we propose a novel block-based extraction method of HOG features, and the proposed method enables us to carry out any machine learning algorithms without any influence, under some conditions. In an experiment, the proposed method is applied to a face image recognition problem under the use of two kinds of classifiers: linear support vector machine (SVM), gaussian SVM, to demonstrate the effectiveness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12434

PDF

http://arxiv.org/pdf/1904.12434
Read All
Automatic extrinsic calibration between a camera and a 3D Lidar using 3D point and plane correspondences

2019-04-29

Surabhi Verma, Julie Stephany Berrio, Stewart Worrall, Eduardo Nebot

arXiv_CV

arXiv_CV
Abstract

This paper proposes an automated method to obtain the extrinsic calibration parameters between a camera and a 3D lidar with as low as 16 beams. We use a checkerboard as a reference to obtain features of interest in both sensor frames. The calibration board centre point and normal vector are automatically extracted from the lidar point cloud by exploiting the geometry of the board. The corresponding features in the camera image are obtained from the camera’s extrinsic matrix. We explain the reasons behind selecting these features, and why they are more robust compared to other possibilities. To obtain the optimal extrinsic parameters, we choose a genetic algorithm to address the highly non-linear state space. The process is automated after defining the bounds of the 3D experimental region relative to the lidar, and the true board dimensions. In addition, the camera is assumed to be intrinsically calibrated. Our method requires a minimum of 3 checkerboard poses, and the calibration accuracy is demonstrated by evaluating our algorithm using real world and simulated features.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12433

PDF

http://arxiv.org/pdf/1904.12433
Read All
Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning

2019-04-29

Xinyang Li, Jie Hu, Shengchuan Zhang, Xiaopeng Hong, Qixiang Ye, Chenglin Wu, Rongrong Ji

arXiv_CV

arXiv_CV Image_Caption
Abstract

Unpaired Image-to-Image Translation (UIT) focuses on translating images among different domains by using unpaired data, which has received increasing research focus due to its practical usage. However, existing UIT schemes defect in the need of supervised training, as well as the lack of encoding domain information. In this paper, we propose an Attribute Guided UIT model termed AGUIT to tackle these two challenges. AGUIT considers multi-modal and multi-domain tasks of UIT jointly with a novel semi-supervised setting, which also merits in representation disentanglement and fine control of outputs. Especially, AGUIT benefits from two-fold: (1) It adopts a novel semi-supervised learning process by translating attributes of labeled data to unlabeled data, and then reconstructing the unlabeled data by a cycle consistency operation. (2) It decomposes image representation into domain-invariant content code and domain-specific style code. The redesigned style code embeds image style into two variables drawn from standard Gaussian distribution and the distribution of domain label, which facilitates the fine control of translation due to the continuity of both variables. Finally, we introduce a new challenge, i.e., disentangled transfer, for UIT models, which adopts the disentangled representation to translate data less related with the training set. Extensive experiments demonstrate the capacity of AGUIT over existing state-of-the-art models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12428

PDF

http://arxiv.org/pdf/1904.12428
Read All
Mixture of Pre-processing Experts Model for Noise Robust Deep Learning on Resource Constrained Platforms

2019-04-29

Taesik Na, Minah Lee, Burhan A. Mudassar, Priyabrata Saha, Jong Hwan Ko, Saibal Mukhopadhyay

arXiv_CV

arXiv_CV Adversarial Object_Detection Tracking Object_Tracking Classification Deep_Learning Detection
Abstract

Deep learning on an edge device requires energy efficient operation due to ever diminishing power budget. Intentional low quality data during the data acquisition for longer battery life, and natural noise from the low cost sensor degrade the quality of target output which hinders adoption of deep learning on an edge device. To overcome these problems, we propose simple yet efficient mixture of pre-processing experts (MoPE) model to handle various image distortions including low resolution and noisy images. We also propose to use adversarially trained auto encoder as a pre-processing expert for the noisy images. We evaluate our proposed method for various machine learning tasks including object detection on MS-COCO 2014 dataset, multiple object tracking problem on MOT-Challenge dataset, and human activity classification on UCF 101 dataset. Experimental results show that the proposed method achieves better detection, tracking and activity classification accuracies under noise without sacrificing accuracies for the clean images. The overheads of our proposed MoPE are 0.67% and 0.17% in terms of memory and computation compared to the baseline object detection network.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12426

PDF

http://arxiv.org/pdf/1904.12426
Read All
Adversarial Speaker Adaptation

2019-04-29

Zhong Meng, Jinyu Li, Yifan Gong

arXiv_CL

arXiv_CL Adversarial Speech_Recognition Classification Recognition
Abstract

We propose a novel adversarial speaker adaptation (ASA) scheme, in which adversarial learning is applied to regularize the distribution of deep hidden features in a speaker-dependent (SD) deep neural network (DNN) acoustic model to be close to that of a fixed speaker-independent (SI) DNN acoustic model during adaptation. An additional discriminator network is introduced to distinguish the deep features generated by the SD model from those produced by the SI model. In ASA, with a fixed SI model as the reference, an SD model is jointly optimized with the discriminator network to minimize the senone classification loss, and simultaneously to mini-maximize the SI/SD discrimination loss on the adaptation data. With ASA, a senone-discriminative deep feature is learned in the SD model with a similar distribution to that of the SI model. With such a regularized and adapted deep feature, the SD model can perform improved automatic speech recognition on the target speaker’s speech. Evaluated on the Microsoft short message dictation dataset, ASA achieves 14.4% and 7.9% relative word error rate improvements for supervised and unsupervised adaptation, respectively, over an SI model trained from 2600 hours data, with 200 adaptation utterances per speaker.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12407

PDF

http://arxiv.org/pdf/1904.12407
Read All
Adversarial Speaker Verification

2019-04-29

Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong

arXiv_CL

arXiv_CL Adversarial Embedding Classification Recognition
Abstract

The use of deep networks to extract embeddings for speaker recognition has proven successfully. However, such embeddings are susceptible to performance degradation due to the mismatches among the training, enrollment, and test conditions. In this work, we propose an adversarial speaker verification (ASV) scheme to learn the condition-invariant deep embedding via adversarial multi-task training. In ASV, a speaker classification network and a condition identification network are jointly optimized to minimize the speaker classification loss and simultaneously mini-maximize the condition loss. The target labels of the condition network can be categorical (environment types) and continuous (SNR values). We further propose multi-factorial ASV to simultaneously suppress multiple factors that constitute the condition variability. Evaluated on a Microsoft Cortana text-dependent speaker verification task, the ASV achieves 8.8% and 14.5% relative improvements in equal error rates (EER) for known and unknown conditions, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12406

PDF

http://arxiv.org/pdf/1904.12406
Read All
A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech

2019-04-29

Joshua Y. Kim, Chunfeng Liu, Rafael A. Calvo, Kathryn McCabe, Silas C. R. Taylor, Björn W. Schuller, Kaihang Wu

arXiv_SD

arXiv_SD Speech_Recognition Recognition
Abstract

Automatic Speech Recognition (ASR) systems have proliferated over the recent years to the point that free platforms such as YouTube now provide speech recognition services. Given the wide selection of ASR systems, we contribute to the field of automatic speech recognition by comparing the relative performance of two sets of manual transcriptions and five sets of automatic transcriptions (Google Cloud, IBM Watson, Microsoft Azure, Trint, and YouTube) to help researchers to select accurate transcription services. In addition, we identify nonverbal behaviors that are associated with unintelligible speech, as indicated by high word error rates. We show that manual transcriptions remain superior to current automatic transcriptions. Amongst the automatic transcription services, YouTube offers the most accurate transcription service. For non-verbal behavioral involvement, we provide evidence that the variability of smile intensities from the listener is high (low) when the speaker is clear (unintelligible). These findings are derived from videoconferencing interactions between student doctors and simulated patients; therefore, we contribute towards both the ASR literature and the healthcare communication skills teaching community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12403

PDF

http://arxiv.org/pdf/1904.12403
Read All
Scalable inference of topic evolution via models for latent geometric structures

2019-04-28

Mikhail Yurochkin, Zhiwei Fan, Aritra Guha, Paraschos Koutris, XuanLong Nguyen

arXiv_CL

arXiv_CL Inference
Abstract

We develop new models and algorithms for learning the temporal dynamics of the topic polytopes and related geometric objects that arise in topic model based inference. Our model is nonparametric Bayesian and the corresponding inference algorithm is able to discover new topics as the time progresses. By exploiting the connection between the modeling of topic polytope evolution, Beta-Bernoulli process and the Hungarian matching algorithm, our method is shown to be several orders of magnitude faster than existing topic modeling approaches, as demonstrated by experiments working with several million documents in under two dozens of minutes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08738

PDF

http://arxiv.org/pdf/1809.08738
Read All
Attentive Adversarial Learning for Domain-Invariant Training

2019-04-28

Zhong Meng, Jinyu Li, Yifan Gong

arXiv_CV

arXiv_CV Adversarial Attention Speech_Recognition Classification Recognition
Abstract

Adversarial domain-invariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR). In ADIT, an auxiliary domain classifier takes in equally-weighted deep features from a deep neural network (DNN) acoustic model and is trained to improve their domain-invariance by optimizing an adversarial loss function. In this work, we propose an attentive ADIT (AADIT) in which we advance the domain classifier with an attention mechanism to automatically weight the input deep features according to their importance in domain classification. With this attentive re-weighting, AADIT can focus on the domain normalization of phonetic components that are more susceptible to domain variability and generates deep features with improved domain-invariance and senone-discriminativity over ADIT. Most importantly, the attention block serves only as an external component to the DNN acoustic model and is not involved in ASR, so AADIT can be used to improve the acoustic modeling with any DNN architectures. More generally, the same methodology can improve any adversarial learning system with an auxiliary discriminator. Evaluated on CHiME-3 dataset, the AADIT achieves 13.6% and 9.3% relative WER improvements, respectively, over a multi-conditional model and a strong ADIT baseline.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12400

PDF

http://arxiv.org/pdf/1904.12400
Read All
Conditional Teacher-Student Learning

2019-04-28

Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong

arXiv_CV

arXiv_CV Knowledge Prediction
Abstract

The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance. To overcome this problem, we propose a conditional T/S learning scheme, in which a “smart” student model selectively chooses to learn from either the teacher model or the ground truth labels conditioned on whether the teacher can correctly predict the ground truth. Unlike a naive linear combination of the two knowledge sources, the conditional learning is exclusively engaged with the teacher model when the teacher model’s prediction is correct, and otherwise backs off to the ground truth. Thus, the student model is able to learn effectively from the teacher and even potentially surpass the teacher. We examine the proposed learning scheme on two tasks: domain adaptation on CHiME-3 dataset and speaker adaptation on Microsoft short message dictation dataset. The proposed method achieves 9.8% and 12.8% relative word error rate reductions, respectively, over T/S learning for environment adaptation and speaker-independent model for speaker adaptation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12399

PDF

http://arxiv.org/pdf/1904.12399
Read All
Stability conditions of an ODE arising in human motion and its numerical simulation

2019-04-28

Takahiro Kosugi, Hitoshi Kino, Masaaki Goto, Yuki Matsutani

arXiv_RO

arXiv_RO
Abstract

This paper discusses the stability of an equilibrium point of an ordinary differential equation (ODE) arising from a feed-forward position control for a musculoskeletal system. The studied system has a link, a joint and two muscles with routing points. The motion convergence of the system strongly depends on the muscular arrangement of the musculoskeletal system. In this paper, a sufficient condition for asymptotic stability is obtained. Furthermore, numerical simulations of the penalized ODE and experimental results are described.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12394

PDF

http://arxiv.org/pdf/1904.12394
Read All
TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text

2019-04-28

Fady Medhat, Mahnaz Mohammadi, Sardar Jaf, Chris G. Willcocks, Toby P. Breckon, Peter Matthews, Andrew Stephen McGough, Georgios Theodoropoulos, Boguslaw Obara

arXiv_CV

arXiv_CV GAN Classification Recognition
Abstract

Handling large corpuses of documents is of significant importance in many fields, no more so than in the areas of crime investigation and defence, where an organisation may be presented with a large volume of scanned documents which need to be processed in a finite time. However, this problem is exacerbated both by the volume, in terms of scanned documents and the complexity of the pages, which need to be processed. Often containing many different elements, which each need to be processed and understood. Text recognition, which is a primary task of this process, is usually dependent upon the type of text, being either handwritten or machine-printed. Accordingly, the recognition involves prior classification of the text category, before deciding on the recognition method to be applied. This poses a more challenging task if a document contains both handwritten and machine-printed text. In this work, we present a generic process flow for text recognition in scanned documents containing mixed handwritten and machine-printed text without the need to classify text in advance. We realize the proposed process flow using several open-source image processing and text recognition packages1. The evaluation is performed using a specially developed variant, presented in this work, of the IAM handwriting database, where we achieve an average transcription accuracy of nearly 80% for pages containing both printed and handwritten text.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12387

PDF

http://arxiv.org/pdf/1904.12387
Read All
Machine Learning in the Air

2019-04-28

Deniz Gunduz, Paul de Kerret, Nicholas D. Sidiropoulos, David Gesbert, Chandra Murthy, Mihaela van der Schaar

arXiv_AI

arXiv_AI Review Face
Abstract

Thanks to the recent advances in processing speed and data acquisition and storage, machine learning (ML) is penetrating every facet of our lives, and transforming research in many areas in a fundamental manner. Wireless communications is another success story – ubiquitous in our lives, from handheld devices to wearables, smart homes, and automobiles. While recent years have seen a flurry of research activity in exploiting ML tools for various wireless communication problems, the impact of these techniques in practical communication systems and standards is yet to be seen. In this paper, we review some of the major promises and challenges of ML in wireless communication systems, focusing mainly on the physical layer. We present some of the most striking recent accomplishments that ML techniques have achieved with respect to classical approaches, and point to promising research directions where ML is likely to make the biggest impact in the near future. We also highlight the complementary problem of designing physical layer techniques to enable distributed ML at the wireless network edge, which further emphasizes the need to understand and connect ML with fundamental concepts in wireless communications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12385

PDF

http://arxiv.org/pdf/1904.12385
Read All
Enhancing Prediction Models for One-Year Mortality in Patients with Acute Myocardial Infarction and Post Myocardial Infarction Syndrome

2019-04-28

Seyedeh Neelufar Payrovnaziri, Laura A. Barrett, Daniel Bis, Jiang Bian, Zhe He

arXiv_AI

arXiv_AI Embedding Deep_Learning Prediction
Abstract

Predicting the risk of mortality for patients with acute myocardial infarction (AMI) using electronic health records (EHRs) data can help identify risky patients who might need more tailored care. In our previous work, we built computational models to predict one-year mortality of patients admitted to an intensive care unit (ICU) with AMI or post myocardial infarction syndrome. Our prior work only used the structured clinical data from MIMIC-III, a publicly available ICU clinical database. In this study, we enhanced our work by adding the word embedding features from free-text discharge summaries. Using a richer set of features resulted in significant improvement in the performance of our deep learning models. The average accuracy of our deep learning models was 92.89% and the average F-measure was 0.928. We further reported the impact of different combinations of features extracted from structured and/or unstructured data on the performance of the deep learning models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12383

PDF

http://arxiv.org/pdf/1904.12383
Read All
Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

2019-04-28

Minhua Wu, Kenichi Kumatani, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister

arXiv_SD

arXiv_SD Knowledge Speech_Recognition Optimization RNN Recognition
Abstract

Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques do not always yield ASR accuracy improvement because the optimization criterion for speech enhancement is not directly relevant to the ASR objective. In this work, we develop new acoustic modeling techniques that optimize spatial filtering and long short-term memory (LSTM) layers from multi-channel (MC) input based on an ASR criterion directly. In contrast to conventional methods, we incorporate array processing knowledge into the acoustic model. Moreover, we initialize the network with beamformers’ coefficients. We investigate effects of such MC neural networks through ASR experiments on the real-world far-field data where users are interacting with an ASR system in uncontrolled acoustic environments. We show that our MC acoustic model can reduce a word error rate (WER) by~16.5\% compared to a single channel ASR system with the traditional log-mel filter bank energy (LFBE) feature on average. Our result also shows that our network with the spatial filtering layer on two-channel input achieves a relative WER reduction of~9.5\% compared to conventional beamforming with seven microphones.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05299

PDF

http://arxiv.org/pdf/1903.05299
Read All
Dynamic Environment Prediction in Urban Scenes using Recurrent Representation Learning

2019-04-28

Masha Itkina, Katherine Driggs-Campbell, Mykel J. Kochenderfer

arXiv_RO

arXiv_RO CNN Represenation_Learning Prediction
Abstract

A key challenge for autonomous driving is safe trajectory planning in cluttered, urban environments with dynamic obstacles, such as pedestrians, bicyclists, and other vehicles. A reliable prediction of the future environment state, including the behavior of dynamic agents, would allow planning algorithms to proactively generate a trajectory in response to a rapidly changing environment. We present a novel framework that predicts the future occupancy state of the local environment surrounding an autonomous agent by learning a motion model from occupancy grid data using a neural network. We take advantage of the temporal structure of the grid data by utilizing a convolutional long-short term memory network in the form of the PredNet architecture. This method is validated on the KITTI dataset and demonstrates higher accuracy and better predictive power than baseline methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12374

PDF

http://arxiv.org/pdf/1904.12374
Read All
Counterexample-Driven Synthesis for Probabilistic Program Sketches

2019-04-28

Milan Češka, Christian Hensel, Sebastian Junges, Joost-Pieter Katoen

arXiv_AI

arXiv_AI Quantitative
Abstract

Probabilistic programs are key to deal with uncertainty in e.g. controller synthesis. They are typically small but intricate. Their development is complex and error prone requiring quantitative reasoning over a myriad of alternative designs. To mitigate this complexity, we adopt counterexample-guided inductive synthesis (CEGIS) to automatically synthesise finite-state probabilistic programs. Our approach leverages efficient model checking, modern SMT solving, and counterexample generation at program level. Experiments on practically relevant case studies show that design spaces with millions of candidate designs can be fully explored using a few thousand verification queries.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12371

PDF

http://arxiv.org/pdf/1904.12371
Read All
LeGR: Filter Pruning via Learned Global Ranking

2019-04-28

Ting-Wu Chin, Ruizhou Ding, Cha Zhang, Diana Marculescu

arXiv_CV

arXiv_CV CNN Optimization
Abstract

Filter pruning has shown to be effective for learning resource-constrained convolutional neural networks (CNNs). However, prior methods for resource-constrained filter pruning have some limitations that hinder their effectiveness and efficiency. When searching for constraint-satisfying CNNs, prior methods either alter the optimization objective or adopt local search algorithms with heuristic parameterization, which are sub-optimal, especially in low-resource regime. From the efficiency perspective, prior methods are often costly to search for constraint-satisfying CNNs. In this work, we propose learned global ranking, dubbed LeGR, which improves upon prior art in the two aforementioned dimensions. Inspired by theoretical analysis, LeGR is parameterized to learn layer-wise affine transformations over the filter norms to construct a learned global ranking. With global ranking, resource-constrained filter pruning at various constraint levels can be done efficiently. We conduct extensive empirical analyses to demonstrate the effectiveness of the proposed algorithm with ResNet and MobileNetV2 networks on CIFAR-10, CIFAR-100, Bird-200, and ImageNet datasets. Code is publicly available at https://github.com/cmu-enyac/LeGR.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12368

PDF

http://arxiv.org/pdf/1904.12368
Read All
Unsupervised Feature Learning for Point Cloud by Contrasting and Clustering With Graph Convolutional Neural Network

2019-04-28

Ling Zhang, Zhigang Zhu

arXiv_CV

arXiv_CV CNN Classification
Abstract

To alleviate the cost of collecting and annotating large-scale point cloud datasets, we propose an unsupervised learning approach to learn features from unlabeled point cloud “3D object” dataset by using part contrasting and object clustering with deep graph neural networks (GNNs). In the contrast learning step, all the samples in the 3D object dataset are cut into two parts and put into a “part” dataset. Then a contrast learning GNN (ContrastNet) is trained to verify whether two randomly sampled parts from the part dataset belong to the same object. In the cluster learning step, the trained ContrastNet is applied to all the samples in the original 3D object dataset to extract features, which are used to group the samples into clusters. Then another GNN for clustering learning (ClusterNet) is trained to predict the cluster ID of all the training samples. The contrasting learning forces the ContrastNet to learn high-level semantic features of objects but probably ignores low-level features, while the ClusterNet improves the quality of learned features by being trained to discover objects that probably belong to the same semantic categories by the use of cluster IDs. We have conducted extensive experiments to evaluate the proposed framework on point cloud classification tasks. The proposed unsupervised learning approach obtained comparable performance to the state-of-the-art unsupervised learning methods that used much more complicated network structures. The code of this work is publicly available via: https://github.com/lingzhang1/ContrastNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12359

PDF

http://arxiv.org/pdf/1904.12359
Read All
Vector Autoregressive POMDP Model Learning and Planning for Human-Robot Collaboration

2019-04-28

Wei Zheng, Hai Lin

arXiv_RO

arXiv_RO Relation
Abstract

Human-robot collaboration (HRC) has emerged as a hot research area at the intersection of control, robotics, and psychology in recent years. It is of critical importance to obtain an expressive but meanwhile tractable model for human beings in HRC. In this paper, we propose a model called Vector Autoregressive POMDP (VAR-POMDP) model which is an extension of the traditional POMDP model by considering the correlation among observations. The VAR-POMDP model is more powerful in the expressiveness of features than the traditional continuous observation POMDP since the traditional one is a special case of the VAR-POMDP model. Meanwhile, the proposed VAR-POMDP model is also tractable, as we show that it can be effectively learned from data and we can extend point-based value iteration (PBVI) to VAR-POMDP planning. Particularly, in this paper, we propose to use the Bayesian non-parametric learning to decide potential human states and learn a VAR-POMDP model using data collected from human demonstrations. Then, we consider planning with respect to PCTL which is widely used as safety and reachability requirement in robotics. Finally, the advantage of using the proposed model for HRC is validated by experimental results using data collected from a driver-assistance test-bed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12357

PDF

http://arxiv.org/pdf/1904.12357
Read All
Deferred Neural Rendering: Image Synthesis using Neural Textures

2019-04-28

Justus Thies, Michael Zollhöfer, Matthias Nießner

arXiv_CV

arXiv_CV Face
Abstract

The modern computer graphics pipeline can synthesize images at remarkable visual quality; however, it requires well-defined, high-quality 3D content as input. In this work, we explore the use of imperfect 3D content, for instance, obtained from photo-metric reconstructions with noisy and incomplete surface geometry, while still aiming to produce photo-realistic (re-)renderings. To address this challenging problem, we introduce Deferred Neural Rendering, a new paradigm for image synthesis that combines the traditional graphics pipeline with learnable components. Specifically, we propose Neural Textures, which are learned feature maps that are trained as part of the scene capture process. Similar to traditional textures, neural textures are stored as maps on top of 3D mesh proxies; however, the high-dimensional feature maps contain significantly more information, which can be interpreted by our new deferred neural rendering pipeline. Both neural textures and deferred neural renderer are trained end-to-end, enabling us to synthesize photo-realistic images even when the original 3D content was imperfect. In contrast to traditional, black-box 2D generative neural networks, our 3D representation gives us explicit control over the generated output, and allows for a wide range of application domains. For instance, we can synthesize temporally-consistent video re-renderings of recorded 3D scenes as our representation is inherently embedded in 3D space. This way, neural textures can be utilized to coherently re-render or manipulate existing video content in both static and dynamic environments at real-time rates. We show the effectiveness of our approach in several experiments on novel view synthesis, scene editing, and facial reenactment, and compare to state-of-the-art approaches that leverage the standard graphics pipeline as well as conventional generative neural networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12356

PDF

http://arxiv.org/pdf/1904.12356
Read All
Cough Detection Using Hidden Markov Models

2019-04-28

Aydin Teyhouee, Nathaniel D. Osgood

arXiv_SD

arXiv_SD Classification Detection
Abstract

Respiratory infections and chronic respiratory diseases impose a heavy health burden worldwide. Coughing is one of the most common symptoms of many such infections, and can be indicative of flare-ups of chronic respiratory diseases. Whether at a clinical or public health level, the capacity to identify bouts of coughing can aid understanding of population and individual health status. Developing health monitoring models in the context of respiratory diseases and also seasonal diseases with symptoms such as cough has the potential to improve quality of life, help clinicians and public health authorities with their decisions and decrease the cost of health services. In this paper, we investigated the ability to which a simple machine learning approach in the form of Hidden Markov Models (HMMs) could be used to classify different states of coughing using univariate (with a single energy band as the input feature) and multivariate (with a multiple energy band as the input features) binned time series using both of cough data. We further used the model to distinguish cough events from other events and environmental noise. Our Hidden Markov algorithm achieved 92% AUR (Area Under Receiver Operating Characteristic Curve) in classifying coughing events in noisy environments. Moreover, comparison of univariate with multivariate HMMs suggest a high accuracy of multivariate HMMs for cough event classifications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12354

PDF

http://arxiv.org/pdf/1904.12354
Read All
Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces

2019-04-28

Sungsu Lim, Ajin Joseph, Lei Le, Yangchen Pan, Martha White

arXiv_AI

arXiv_AI Optimization
Abstract

Q-learning can be difficult to use in continuous action spaces, because an optimization has to be solved to find the maximal action for the action-values. A common strategy has been to restrict the functional form of the action-values to be concave in the actions, to simplify the optimization. Such restrictions, however, can prevent learning accurate action-values. In this work, we propose a new policy search objective that facilitates using Q-learning and a framework to optimize this objective, called Actor-Expert. The Expert uses Q-learning to update the action-values towards optimal action-values. The Actor learns the maximal actions over time for these changing action-values. We develop a Cross Entropy Method (CEM) for the Actor, where such a global optimization approach facilitates use of generically parameterized action-values. This method - which we call Conditional CEM - iteratively concentrates density around maximal actions, conditioned on state. We prove that this algorithm tracks the expected CEM update, over states with changing action-values. We demonstrate in a toy environment that previous methods that restrict the action-value parameterization fail whereas Actor-Expert with a more general action-value parameterization succeeds. Finally, we demonstrate that Actor-Expert performs as well as or better than competitors on four benchmark continuous-action environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.09103

PDF

http://arxiv.org/pdf/1810.09103
Read All
Real-time Trajectory Generation for Quadrotors using B-spline based Non-uniform Kinodynamic Search

2019-04-28

Lvbang Tang, Hesheng Wang

arXiv_RO

arXiv_RO
Abstract

In this paper, we propose a time-efficient approach to generate safe, smooth and dynamically feasible trajectories for quadrotors in obstacle-cluttered environment. By using the uniform B-spline to represent trajectories, we transform the trajectory planning to a graph-search problem of B-spline control points in discretized space. Highly strict convex hull property of B-spline is derived to guarantee the dynamical feasibility of the entire trajectory. A novel non-uniform kinodynamic search strategy is adopted, and the step length is dynamically adjusted during the search process according to the Euclidean signed distance field (ESDF), making the trajectory achieve reasonable time-allocation and be away from obstacles. Non-static initial and goal states are allowed, therefore it can be used for online local replanning as well as global planning. Extensive simulation and hardware experiments show that our method achieves higher performance comparing with the state-of-the-art method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12348

PDF

http://arxiv.org/pdf/1904.12348
Read All
Domain Agnostic Learning with Disentangled Representations

2019-04-28

Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko

arXiv_CV

arXiv_CV Adversarial Knowledge Image_Classification Classification
Abstract

Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains. Yet the current literature assumes that the separation of target data into distinct domains is known as a priori. In this paper, we propose the task of Domain-Agnostic Learning (DAL): How to transfer knowledge from a labeled source domain to unlabeled data from arbitrary target domains? To tackle this problem, we devise a novel Deep Adversarial Disentangled Autoencoder (DADA) capable of disentangling domain-specific features from class identity. We demonstrate experimentally that when the target domain labels are unknown, DADA leads to state-of-the-art performance on several image classification datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12347

PDF

http://arxiv.org/pdf/1904.12347
Read All
Supporting Video Queries on Zero-Streaming Cameras

2019-04-28

Mengwei Xu, Tiantu Xu, Yunxin Liu, Xuanzhe Liu, Gang Huang, Felix Xiaozhu Lin

arXiv_CV

arXiv_CV Sparse Knowledge
Abstract

As low-cost surveillance cameras grow rapidly, we advocate for these cameras to be zero streaming: ingesting videos directly to their local storage and only communicating with the cloud in response to queries. To support queries over videos stored on zero-streaming cameras, we describe a system that spans the cloud and cameras. The system builds on two unconventional ideas. When ingesting video frames, a camera learns accurate knowledge on a sparse sample of frames, rather than learning inaccurate knowledge on all frames; in executing one query, a camera processes frames in multiple passes with multiple operators trained and picked by the cloud during the query, rather than one-pass processing with operator(s) decided ahead of the query. On diverse queries over 750-hour videos and with typical wireless network bandwidth and low-cost camera hardware, our system runs at more than 100x video realtime. It outperforms competitive alternative designs by at least 4x and up to two orders of magnitude.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12342

PDF

http://arxiv.org/pdf/1904.12342
Read All
Learning walk and trot from the same objective using different types of exploration

2019-04-28

Zinan Liu, Kai Ploeger, Svenja Stark, Elmar Rueckert, Jan Peters

arXiv_RO

arXiv_RO Knowledge
Abstract

In quadruped gait learning, policy search methods that scale high dimensional continuous action spaces are commonly used. In most approaches, it is necessary to introduce prior knowledge on the gaits to limit the highly non-convex search space of the policies. In this work, we propose a new approach to encode the symmetry properties of the desired gaits, on the initial covariance of the Gaussian search distribution, allowing for strategic exploration. Using episode-based likelihood ratio policy gradient and relative entropy policy search, we learned the gaits walk and trot on a simulated quadruped. Comparing these gaits to random gaits learned by initialized diagonal covariance matrix, we show that the performance can be significantly enhanced.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12336

PDF

http://arxiv.org/pdf/1904.12336
Read All
Measuring similarity between geo-tagged videos using largest common view

2019-04-28

Wei Ding, KwangSoo Yang, Kwang Woo Nam

arXiv_CV

arXiv_CV Relation
Abstract

This paper presents a novel problem for discovering the similar trajectories based on the field of view (FoV) of the video data. The problem is important for many societal applications such as grouping moving objects, classifying geo-images, and identifying the interesting trajectory patterns. Prior work consider only either spatial locations or spatial relationship between two line-segments. However, these approaches show a limitation to find the similar moving objects with common views. In this paper, we propose new algorithm that can group both spatial locations and points of view to identify similar trajectories. We also propose novel methods that reduce the computational cost for the proposed work. Experimental results using real-world datasets demonstrates that the proposed approach outperforms prior work and reduces the computational cost.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03695

PDF

http://arxiv.org/pdf/1905.03695
Read All
OPIEC: An Open Information Extraction Corpus

2019-04-28

Kiril Gashteovski, Sebastian Wanner, Sven Hertling, Samuel Broscheit, Rainer Gemulla

arXiv_CL

arXiv_CL Knowledge Relation
Abstract

Open information extraction (OIE) systems extract relations and their arguments from natural language text in an unsupervised manner. The resulting extractions are a valuable resource for downstream tasks such as knowledge base construction, open question answering, or event schema induction. In this paper, we release, describe, and analyze an OIE corpus called OPIEC, which was extracted from the text of English Wikipedia. OPIEC complements the available OIE resources: It is the largest OIE corpus publicly available to date (over 340M triples) and contains valuable metadata such as provenance information, confidence scores, linguistic annotations, and semantic annotations including spatial and temporal information. We analyze the OPIEC corpus by comparing its content with knowledge bases such as DBpedia or YAGO, which are also based on Wikipedia. We found that most of the facts between entities present in OPIEC cannot be found in DBpedia and/or YAGO, that OIE facts often differ in the level of specificity compared to knowledge base facts, and that OIE open relations are generally highly polysemous. We believe that the OPIEC corpus is a valuable resource for future research on automated knowledge base construction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12324

PDF

http://arxiv.org/pdf/1904.12324
Read All
An approach to image denoising using manifold approximation without clean images

2019-04-28

Rohit Jena

arXiv_CV

arXiv_CV Super_Resolution Image_Enhancement Deep_Learning
Abstract

Image restoration has been an extensively researched topic in numerous fields. With the advent of deep learning, a lot of the current algorithms were replaced by algorithms that are more flexible and robust. Deep networks have demonstrated impressive performance in a variety of tasks like blind denoising, image enhancement, deblurring, super-resolution, inpainting, among others. Most of these learning-based algorithms use a large amount of clean data during the training process. However, in certain applications in medical image processing, one may not have access to a large amount of clean data. In this paper, we propose a method for denoising that attempts to learn the denoising process by pushing the noisy data close to the clean data manifold, using only noisy images during training. Furthermore, we use perceptual loss terms and an iterative refinement step to further refine the clean images without losing important features.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12323

PDF

http://arxiv.org/pdf/1904.12323
Read All
Classification and Detection in Mammograms with Weak Supervision via Dual Branch Deep Neural Net

2019-04-28

Ran Bakalo, Rami Ben-Ari, Jacob Goldberger

arXiv_CV

arXiv_CV Weakly_Supervised Classification Deep_Learning Detection
Abstract

The high cost of generating expert annotations, poses a strong limitation for supervised machine learning methods in medical imaging. Weakly supervised methods may provide a solution to this tangle. In this study, we propose a novel deep learning architecture for multi-class classification of mammograms according to the severity of their containing anomalies, having only a global tag over the image. The suggested scheme further allows localization of the different types of findings in full resolution. The new scheme contains a dual branch network that combines region-level classification with region ranking. We evaluate our method on a large multi-center mammography dataset including $\sim$3,000 mammograms with various anomalies and demonstrate the advantages of the proposed method over a previous weakly-supervised strategy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12319

PDF

http://arxiv.org/pdf/1904.12319
Read All

50/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL