Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

A novel geometrically inspired polynomial kernel for robot inverse dynamics

2019-04-30

Alberto Dalla Libera, Ruggero Carli

arXiv_RO

arXiv_RO
Abstract

In this paper we introduce a novel data driven inverse dynamics estimator based on Gaussian Process Regression. Driven by the fact that the inverse dynamics can be described as a polynomial function on a suitable input space, we propose the use of a polynomial kernel, based on a set of parameters which is different from the one typically considered in the literature. This novel parametrization allows for an higher flexibility in selecting only the needed information to model the complexity of the problem. We tested the proposed approach in a simulated environment, and also in real experiments with a UR10 robot. The obtained results confirm that, compared to standard data driven estimators, the proposed approach is more data efficient and exhibits better generalization properties.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13317

PDF

http://arxiv.org/pdf/1904.13317
Read All
On Social Machines for Algorithmic Regulation

2019-04-30

Nello Cristianini, Teresa Scantamburlo

arXiv_AI

arXiv_AI GAN
Abstract

Autonomous mechanisms have been proposed to regulate certain aspects of society and are already being used to regulate business organisations. We take seriously recent proposals for algorithmic regulation of society, and we identify the existing technologies that can be used to implement them, most of them originally introduced in business contexts. We build on the notion of ‘social machine’ and we connect it to various ongoing trends and ideas, including crowdsourced task-work, social compiler, mechanism design, reputation management systems, and social scoring. After showing how all the building blocks of algorithmic regulation are already well in place, we discuss possible implications for human autonomy and social order. The main contribution of this paper is to identify convergent social and technical trends that are leading towards social regulation by algorithms, and to discuss the possible social, political, and ethical consequences of taking this path.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13316

PDF

http://arxiv.org/pdf/1904.13316
Read All
Unsupervised automatic classification of Scanning Electron Microscopy images of CD4+ cells with varying extent of HIV virion infection

2019-04-30

John M. Wandeto, Birgitta Dresp-Langley

arXiv_AI

arXiv_AI GAN Classification
Abstract

Archiving large sets of medical or cell images in digital libraries may require ordering randomly scattered sets of image data according to specific criteria, such as the spatial extent of a specific local color or contrast content that reveals different meaningful states of a physiological structure, tissue, or cell in a certain order, indicating progression or recession of a pathology, or the progressive response of a cell structure to treatment. Here we used a Self Organized Map (SOM)-based, fully automatic and unsupervised, classification procedure described in our earlier work and applied it to sets of minimally processed grayscale and/or color processed Scanning Electron Microscopy (SEM) images of CD4+ T-lymphocytes (so-called helper cells) with varying extent of HIV virion infection. It is shown that the quantization error in the SOM output after training permits to scale the spatial magnitude and the direction of change (+ or -) in local pixel contrast or color across images of a series with a reliability that exceeds that of any human expert. The procedure is easily implemented and fast, and represents a promising step towards low-cost automatic digital image archiving with minimal intervention of a human operator.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03700

PDF

http://arxiv.org/pdf/1905.03700
Read All
Segmentation is All You Need

2019-04-30

Yuxiang Wu, Zehua Cheng, Zhenghua Xu, Weiyang Wang

arXiv_CV

arXiv_CV Object_Detection Knowledge Segmentation Face Detection
Abstract

We propose a new paradigm of the detection task that is anchor-box free and NMS free. Although the current state-of-the-art model that based on region proposed method has been well-acknowledged for years, however as the basis of RPN, NMS cannot solve the problem of low recall in complicated occlusion situation. This situation is particularly critical when it faces up to complex occlusion. We proposed to use weak-supervised segmentation multimodal annotations to achieve a highly robust object detection performance without NMS. In such cases, we utilize poor annotated Bounding Box annotations to perform a robust object detection performance in the difficult circumstance. We have avoided all hyperparameters related to anchor boxes and NMS. Our proposed model has outperformed previous anchor-based one-stage and multi-stage detectors with the advantage of being much simpler. We have reached a state-of-the-art performance in both accuracies, recall rate.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13300

PDF

http://arxiv.org/pdf/1904.13300
Read All
Governance by Glass-Box: Implementing Transparent Moral Bounds for AI Behaviour

2019-04-30

Andrea Aler Tubella, Andreas Theodorou, Virginia Dignum, Frank Dignum

arXiv_AI

arXiv_AI Knowledge
Abstract

Artificial Intelligence (AI) applications are being used to predict and assess behaviour in multiple domains, such as criminal justice and consumer finance, which directly affect human well-being. However, if AI is to improve people’s lives, then people must be able to trust AI, which means being able to understand what the system is doing and why. Even though transparency is often seen as the requirement in this case, realistically it might not always be possible or desirable, whereas the need to ensure that the system operates within set moral bounds remains. In this paper, we present an approach to evaluate the moral bounds of an AI system based on the monitoring of its inputs and outputs. We place a “glass box” around the system by mapping moral values into explicit verifiable norms that constrain inputs and outputs, in such a way that if these remain within the box we can guarantee that the system adheres to the value. The focus on inputs and outputs allows for the verification and comparison of vastly different intelligent systems; from deep neural networks to agent-based systems. The explicit transformation of abstract moral values into concrete norms brings great benefits in terms of explainability; stakeholders know exactly how the system is interpreting and employing relevant abstract moral human values and calibrate their trust accordingly. Moreover, by operating at a higher level we can check the compliance of the system with different interpretations of the same value. These advantages will have an impact on the well-being of AI systems users at large, building their trust and providing them with concrete knowledge on how systems adhere to moral values.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04994

PDF

http://arxiv.org/pdf/1905.04994
Read All
Performing Structured Improvisations with pre-trained Deep Learning Models

2019-04-30

Pablo Samuel Castro

arXiv_SD

arXiv_SD Deep_Learning
Abstract

The quality of outputs produced by deep generative models for music have seen a dramatic improvement in the last few years. However, most deep learning models perform in “offline” mode, with few restrictions on the processing time. Integrating these types of models into a live structured performance poses a challenge because of the necessity to respect the beat and harmony. Further, these deep models tend to be agnostic to the style of a performer, which often renders them impractical for live performance. In this paper we propose a system which enables the integration of out-of-the-box generative models by leveraging the musician’s creativity and expertise.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13285

PDF

http://arxiv.org/pdf/1904.13285
Read All
Occupancy Networks: Learning 3D Reconstruction in Function Space

2019-04-30

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, Andreas Geiger

arXiv_CV

arXiv_CV Face Quantitative
Abstract

With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.03828

PDF

http://arxiv.org/pdf/1812.03828
Read All
CT-To-MR Conditional Generative Adversarial Networks for Ischemic Stroke Lesion Segmentation

2019-04-30

Jonathan Rubin, S. Mazdak Abulnaga

arXiv_CV

arXiv_CV Adversarial Segmentation GAN CNN Quantitative
Abstract

Infarcted brain tissue resulting from acute stroke readily shows up as hyperintense regions within diffusion-weighted magnetic resonance imaging (DWI). It has also been proposed that computed tomography perfusion (CTP) could alternatively be used to triage stroke patients, given improvements in speed and availability, as well as reduced cost. However, CTP has a lower signal to noise ratio compared to MR. In this work, we investigate whether a conditional mapping can be learned by a generative adversarial network to map CTP inputs to generated MR DWI that more clearly delineates hyperintense regions due to ischemic stroke. We detail the architectures of the generator and discriminator and describe the training process used to perform image-to-image translation from multi-modal CT perfusion maps to diffusion weighted MR outputs. We evaluate the results both qualitatively by visual comparison of generated MR to ground truth, as well as quantitatively by training fully convolutional neural networks that make use of generated MR data inputs to perform ischemic stroke lesion segmentation. Segmentation networks trained using generated CT-to-MR inputs result in at least some improvement on all metrics used for evaluation, compared with networks that only use CT perfusion input.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13281

PDF

http://arxiv.org/pdf/1904.13281
Read All
Machine Decisions and Human Consequences

2019-04-30

Teresa Scantamburlo, Andrew Charlesworth, Nello Cristianini

arXiv_AI

arXiv_AI Classification Prediction Relation
Abstract

As we increasingly delegate decision-making to algorithms, whether directly or indirectly, important questions emerge in circumstances where those decisions have direct consequences for individual rights and personal opportunities, as well as for the collective good. A key problem for policymakers is that the social implications of these new methods can only be grasped if there is an adequate comprehension of their general technical underpinnings. The discussion here focuses primarily on the case of enforcement decisions in the criminal justice system, but draws on similar situations emerging from other algorithms utilised in controlling access to opportunities, to explain how machine learning works and, as a result, how decisions are made by modern intelligent algorithms or ‘classifiers’. It examines the key aspects of the performance of classifiers, including how classifiers learn, the fact that they operate on the basis of correlation rather than causation, and that the term ‘bias’ in machine learning has a different meaning to common usage. An example of a real world ‘classifier’, the Harm Assessment Risk Tool (HART), is examined, through identification of its technical features: the classification method, the training data and the test data, the features and the labels, validation and performance measures. Four normative benchmarks are then considered by reference to HART: (a) prediction accuracy (b) fairness and equality before the law (c) transparency and accountability (d) informational privacy and freedom of expression, in order to demonstrate how its technical features have important normative dimensions that bear directly on the extent to which the system can be regarded as a viable and legitimate support for, or even alternative to, existing human decision-makers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.06747

PDF

http://arxiv.org/pdf/1811.06747
Read All
Incrementally Learned Mixture Models for GNSS Localization

2019-04-30

Tim Pfeifer, Peter Protzel

arXiv_RO

arXiv_RO Knowledge Inference
Abstract

GNSS localization is an important part of today’s autonomous systems, although it suffers from non-Gaussian errors caused by non-line-of-sight effects. Recent methods are able to mitigate these effects by including the corresponding distributions in the sensor fusion algorithm. However, these approaches require prior knowledge about the sensor’s distribution, which is often not available. We introduce a novel sensor fusion algorithm based on variational Bayesian inference, that is able to approximate the true distribution with a Gaussian mixture model and to learn its parametrization online. The proposed Incremental Variational Mixture algorithm automatically adapts the number of mixture components to the complexity of the measurement’s error distribution. We compare the proposed algorithm against current state-of-the-art approaches using a collection of open access real world datasets and demonstrate its superior localization accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13279

PDF

http://arxiv.org/pdf/1904.13279
Read All
Detecting Reflections by Combining Semantic and Instance Segmentation

2019-04-30

David Owen, Ping-Lin Chang

arXiv_CV

arXiv_CV Object_Detection Segmentation Face Semantic_Segmentation Detection
Abstract

Reflections in natural images commonly cause false positives in automated detection systems. These false positives can lead to significant impairment of accuracy in the tasks of detection, counting and segmentation. Here, inspired by the recent panoptic approach to segmentation, we show how fusing instance and semantic segmentation can automatically identify reflection false positives, without explicitly needing to have the reflective regions labelled. We explore in detail how state of the art two-stage detectors suffer a loss of broader contextual features, and hence are unable to learn to ignore these reflections. We then present an approach to fuse instance and semantic segmentations for this application, and subsequently show how this reduces false positive detections in a real world surveillance data with a large number of reflective surfaces. This demonstrates how panoptic segmentation and related work, despite being in its infancy, can already be useful in real world computer vision problems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13273

PDF

http://arxiv.org/pdf/1904.13273
Read All
Non-Rigid Structure-From-Motion by Rank-One Basis Shapes

2019-04-30

Sami S. Brandt, Hanno Ackermann

arXiv_CV

arXiv_CV Face
Abstract

In this paper, we show that the affine, non-rigid structure-from-motion problem can be solved by rank-one, thus degenerate, basis shapes. It is a natural reformulation of the classic low-rank method by Bregler et al., where it was assumed that the deformable 3D structure is generated by a linear combination of rigid basis shapes. The non-rigid shape will be decomposed into the mean shape and the degenerate shapes, constructed from the right singular vectors of the low-rank decomposition. The right singular vectors are affinely back-projected into the 3D space, and the affine back-projections will also be solved as part of the factorisation. By construction, a direct interpretation for the right singular vectors of the low-rank decomposition will also follow: they can be seen as principal components, hence, the first variant of our method is referred to as Rank-1-PCA. The second variant, referred to as Rank-1-ICA, additionally estimates the orthogonal transform which maps the deformation modes into as statistically independent modes as possible. It has the advantage of pinpointing statistically dependent subspaces related to, for instance, lip movements on human faces. Moreover, in contrast to prior works, no predefined dimensionality for the subspaces is imposed. The experiments on several datasets show that the method achieves better results than the state-of-the-art, it can be computed faster, and it provides an intuitive interpretation for the deformation modes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13271

PDF

http://arxiv.org/pdf/1904.13271
Read All
Country-wide high-resolution vegetation height mapping with Sentinel-2

2019-04-30

Nico Lang, Konrad Schindler, Jan Dirk Wegner

arXiv_CV

arXiv_CV Face CNN
Abstract

Sentinel-2 multi-spectral images collected over periods of several months were used to estimate vegetation height for Gabon, respectively Switzerland. A deep convolutional network was trained to extract suitable spectral and textural features from reflectance images and to regress per-pixel vegetation height. In Gabon, reference heights for training and validation were derived from airborne LiDAR measurements. In Switzerland, reference heights were taken from an existing canopy height model derived via photogrammetric surface reconstruction. The resulting maps have a mean absolute error (MAE) of 1.7m in Switzerland, respectively 4.3m in Gabon, and correctly reproduce vegetation heights up to >50m. They also show good qualitative agreement with existing vegetation height maps. Our work demonstrates that, given a moderate amount of reference data, dense vegetation height maps with 10m ground sampling distance (GSD) can be derived at country scale from Sentinel-2 imagery.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13270

PDF

http://arxiv.org/pdf/1904.13270
Read All
Handwritten Chinese Font Generation with Collaborative Stroke Refinement

2019-04-30

Chuan Wen, Jie Chang, Ya Zhang

arXiv_CV

arXiv_CV Face CNN
Abstract

Automatic character generation is an appealing solution for new typeface design, especially for Chinese typefaces including over 3700 most commonly-used characters. This task has two main pain points: (i) handwritten characters are usually associated with thin strokes of few information and complex structure which are error prone during deformation; (ii) thousands of characters with various shapes are needed to synthesize based on a few manually designed characters. To solve those issues, we propose a novel convolutional-neural-network-based model with three main techniques: collaborative stroke refinement, using collaborative training strategy to recover the missing or broken strokes; online zoom-augmentation, taking the advantage of the content-reuse phenomenon to reduce the size of training set; and adaptive pre-deformation, standardizing and aligning the characters. The proposed model needs only 750 paired training samples; no pre-trained network, extra dataset resource or labels is needed. Experimental results show that the proposed method significantly outperforms the state-of-the-art methods under the practical restriction on handwritten font synthesis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13268

PDF

http://arxiv.org/pdf/1904.13268
Read All
Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors

2019-04-30

Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Francesco Moramarco, Jack Flann, Nils Y. Hammerla

arXiv_CL

arXiv_CL Embedding Deep_Learning
Abstract

Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks. Furthermore, when averaged word vectors are trained supervised on large corpora of paraphrases, they achieve state-of-the-art results on standard STS benchmarks. Inspired by these insights, we push the limits of word embeddings even further. We propose a novel fuzzy bag-of-words (FBoW) representation for text that contains all the words in the vocabulary simultaneously but with different degrees of membership, which are derived from similarities between word vectors. We show that max-pooled word vectors are only a special case of fuzzy BoW and should be compared via fuzzy Jaccard index rather than cosine similarity. Finally, we propose DynaMax, a completely unsupervised and non-parametric similarity measure that dynamically extracts and max-pools good features depending on the sentence pair. This method is both efficient and easy to implement, yet outperforms current baselines on STS tasks by a large margin and is even competitive with supervised word vectors trained to directly optimise cosine similarity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13264

PDF

http://arxiv.org/pdf/1904.13264
Read All
Interfacing PDM sensors with PFM spiking systems: application for Neuromorphic Auditory Sensors

2019-04-30

Angel Jimenez-Fernandez, Juan Pedro Dominguez-Morales, Daniel Gutierrez-Galan, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Alejandro Linares-Barranco

arXiv_SD

arXiv_SD Face
Abstract

In this paper we present a sub-system to convert audio information from low-power MEMS microphones with pulse density modulation (PDM) output into rate coded spike streams. These spikes represent the input signal of a Neuromorphic Auditory Sensor (NAS), which is implemented with Spike Signal Processing (SSP) building blocks. For this conversion, we have designed a HDL component for FPGA able to interface with PDM microphones and converts their pulses to temporal distributed spikes following a pulse frequency modulation (PFM) scheme with an accurate configurable Inter-Spike-Interval. The new FPGA component has been tested in two scenarios, first as a stand-alone circuit for its characterization, and then it has been integrated with a full NAS design to verify its behavior. This PDM interface demands less than 1% of a Spartan 6 FPGA resources and has a power consumption below 5mW.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00390

PDF

http://arxiv.org/pdf/1905.00390
Read All
English Broadcast News Speech Recognition by Humans and Machines

2019-04-30

Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko

arXiv_CL

arXiv_CL Attention Speech_Recognition RNN Deep_Learning Language_Model Recognition
Abstract

With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate the usefulness of these proposed techniques on broadcast news (BN), a similar challenging task. We also perform a set of recognition measurements to understand how close the achieved automatic speech recognition results are to human performance on this task. On two publicly available BN test sets, DEV04F and RT04, our speech recognition system using LSTM and residual network based acoustic models with a combination of n-gram and neural network language models performs at 6.5% and 5.9% word error rate. By achieving new performance milestones on these test sets, our experiments show that techniques developed on other related tasks, like CTS, can be transferred to achieve similar performance. In contrast, the best measured human recognition performance on these test sets is much lower, at 3.6% and 2.8% respectively, indicating that there is still room for new techniques and improvements in this space, to reach human performance levels.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13258

PDF

http://arxiv.org/pdf/1904.13258
Read All
Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

2019-04-30

Simon Leglaive, Laurent Girin, Radu Horaud

arXiv_SD

arXiv_SD
Abstract

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.06713

PDF

http://arxiv.org/pdf/1811.06713
Read All
Generative Adversarial Imagination for Sample Efficient Deep Reinforcement Learning

2019-04-30

Kacper Kielak

arXiv_AI

arXiv_AI Adversarial Reinforcement_Learning Deep_Learning
Abstract

Reinforcement learning has seen great advancements in the past five years. The successful introduction of deep learning in place of more traditional methods allowed reinforcement learning to scale to very complex domains achieving super-human performance in environments like the game of Go or numerous video games. Despite great successes in multiple domains, these new methods suffer from their own issues that make them often inapplicable to the real world problems. Extreme lack of data efficiency, together with huge variance and difficulty in enforcing safety constraints, is one of the three most prominent issues in the field. Usually, millions of data points sampled from the environment are necessary for these algorithms to converge to acceptable policies. This thesis proposes novel Generative Adversarial Imaginative Reinforcement Learning algorithm. It takes advantage of the recent introduction of highly effective generative adversarial models, and Markov property that underpins reinforcement learning setting, to model dynamics of the real environment within the internal imagination module. Rollouts from the imagination are then used to artificially simulate the real environment in a standard reinforcement learning process to avoid, often expensive and dangerous, trial and error in the real environment. Experimental results show that the proposed algorithm more economically utilises experience from the real environment than the current state-of-the-art Rainbow DQN algorithm, and thus makes an important step towards sample efficient deep reinforcement learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13255

PDF

http://arxiv.org/pdf/1904.13255
Read All
Online Causal Structure Learning in the Presence of Latent Variables

2019-04-30

Durdane Kocacoban, James Cussens

arXiv_AI

arXiv_AI Relation
Abstract

We present two online causal structure learning algorithms which can track changes in a causal structure and process data in a dynamic real-time manner. Standard causal structure learning algorithms assume that causal structure does not change during the data collection process, but in real-world scenarios, it does often change. Therefore, it is inappropriate to handle such changes with existing batch-learning approaches, and instead, a structure should be learned in an online manner. The online causal structure learning algorithms we present here can revise correlation values without reprocessing the entire dataset and use an existing model to avoid relearning the causal links in the prior model, which still fit data. Proposed algorithms are tested on synthetic and real-world datasets, the latter being a seasonally adjusted commodity price index dataset for the U.S. The online causal structure learning algorithms outperformed standard FCI by a large margin in learning the changed causal structure correctly and efficiently when latent variables were present.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13247

PDF

http://arxiv.org/pdf/1904.13247
Read All
Alignment-Free Cross-Sensor Fingerprint Matching based on the Co-Occurrence of Ridge Orientations and Gabor-HoG Descriptor

2019-04-30

Helala AlShehri, Muhammad Hussain, Hatim AboAlSamh, Qazi Emad-ul-Haq, Aqil M. Azmi

arXiv_CV

arXiv_CV Relation
Abstract

The existing automatic fingerprint verification methods are designed to work under the assumption that the same sensor is installed for enrollment and authentication (regular matching). There is a remarkable decrease in efficiency when one type of contact-based sensor is employed for enrolment and another type of contact-based sensor is used for authentication (cross-matching or fingerprint sensor interoperability problem,). The ridge orientation patterns in a fingerprint are invariant to sensor type. Based on this observation, we propose a robust fingerprint descriptor called the co-occurrence of ridge orientations (Co-Ror), which encodes the spatial distribution of ridge orientations. Employing this descriptor, we introduce an efficient automatic fingerprint verification method for cross-matching problem. Further, to enhance the robustness of the method, we incorporate scale based ridge orientation information through Gabor-HoG descriptor. The two descriptors are fused with canonical correlation analysis (CCA), and the matching score between two fingerprints is calculated using city-block distance. The proposed method is alignment-free and can handle the matching process without the need for a registration step. The intensive experiments on two benchmark databases (FingerPass and MOLF) show the effectiveness of the method and reveal its significant enhancement over the state-of-the-art methods such as VeriFinger (a commercial SDK), minutia cylinder-code (MCC), MCC with scale, and the thin-plate spline (TPS) model. The proposed research will help security agencies, service providers and law-enforcement departments to overcome the interoperability problem of contact sensors of different technology and interaction types.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03699

PDF

http://arxiv.org/pdf/1905.03699
Read All
Acoustic Probing for Estimating the Storage Time and Firmness of Tomatoes and Mandarin Oranges

2019-04-30

Hidetomo Kataoka (1), Takashi Ijiri (2), Kohei Matsumura (1), Jeremy White (1), Akira Hirabayashi (1) ((1) Graduate School of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan, (2) College of Engineering, Shibaura Institute of Technology, Toyosu, Tokyo, Japan)

arXiv_SD

arXiv_SD
Abstract

This paper introduces an acoustic probing technique to estimate the storage time and firmness of fruits; we emit an acoustic signal to fruit from a small speaker and capture the reflected signal with a tiny microphone. We collect reflected signals for fruits with various storage times and firmness conditions, using them to train regressors for estimation. To evaluate the feasibility of our acoustic probing, we performed experiments; we prepared 162 tomatoes and 153 mandarin oranges, collected their reflected signals using our developed device and measured their firmness with a fruit firmness tester, for a period of 35 days for tomatoes and 60 days for mandarin oranges. We performed cross validation by using this data set. The average estimation errors of storage time and firmness for tomatoes were 0.89 days and 9.47 g/mm2. Those for mandarin oranges were 1.67 days and 15.67 g/mm2. The estimation of storage time was sufficiently accurate for casual users to select fruits in their favorite condition at home. In the experiments, we tested four different acoustic probes and found that sweep signals provide highly accurate estimation results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.10581

PDF

http://arxiv.org/pdf/1809.10581
Read All
GaborNet: Gabor filters with learnable parameters in deep convolutional neural networks

2019-04-30

Andrey Alekseev, Anatoly Bobe

arXiv_CV

arXiv_CV CNN Recognition
Abstract

The article describes a system for image recognition using deep convolutional neural networks. Modified network architecture is proposed that focuses on improving convergence and reducing training complexity. The filters in the first layer of the network are constrained to fit the Gabor function. The parameters of Gabor functions are learnable and are updated by standard backpropagation techniques. The system was implemented on Python, tested on several datasets and outperformed the common convolutional networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13204

PDF

http://arxiv.org/pdf/1904.13204
Read All
Separation of water and fat signal in whole-body gradient echo scans using convolutional neural networks

2019-04-30

Jonathan Andersson, Håkan Ahlström, Joel Kullberg

arXiv_CV

arXiv_CV CNN Inference
Abstract

Purpose: To perform and evaluate water-fat signal separation of whole-body gradient echo scans using convolutional neural networks. Methods: Whole-body gradient echo scans of 240 subjects, each consisting of 5 bipolar echoes, were used. Reference fat fraction maps were created using a conventional method. Convolutional neural networks, more specifically 2D U-nets, were trained using 5-fold cross-validation with 1 or several echoes as input, using the squared difference between the output and the reference fat fraction maps as the loss function. The outputs of the networks were assessed by the loss function, measured liver fat fractions, and visually. Training was performed using a graphics processing unit (GPU). Inference was performed using the GPU as well as a central processing unit (CPU). Results: The loss curves indicated convergence, and the final loss of the validation data decreased when using more echoes as input. The liver fat fractions could be estimated using only 1 echo, but results were improved by use of more echoes. Visual assessment found the quality of the outputs of the networks to be similar to the reference even when using only 1 echo, with slight improvements when using more echoes. Training a network took at most 28.6 h. Inference time of a whole-body scan took at most 3.7 s using the GPU and 5.8 min using the CPU. Conclusion: It is possible to perform water-fat signal separation of whole-body gradient echo scans using convolutional neural networks. Separation was possible using only 1 echo, although using more echoes improved the results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.04922

PDF

http://arxiv.org/pdf/1812.04922
Read All
JND-SalCAR: A Novel JND-based Saliency-Channel Attention Residual Network for Image Quality Prediction

2019-04-30

Soomin Seo, Sehwan Ki, Munchurl Kim

arXiv_CV

arXiv_CV Salient Knowledge QA Attention CNN Optimization Prediction
Abstract

In image quality enhancement processing, it is the most important to predict how humans perceive processed images since human observers are the ultimate receivers of the images. Thus, objective image quality assessment (IQA) methods based on human visual sensitivity from psychophysical experiments have been extensively studied. Thanks to the powerfulness of deep convolutional neural networks (CNN), many CNN based IQA models have been studied. However, previous CNN-based IQA models have not fully utilized the characteristics of human visual systems (HVS) for IQA problems by simply entrusting everything to CNN where the CNN-based models are often trained as a regressor to predict the scores of subjective quality assessment obtained from IQA datasets. In this paper, we propose a novel JND-based saliency-channel attention residual network for image quality assessment, called JND-SalCAR, where the human psychophysical characteristics such as visual saliency and just noticeable difference (JND) are effectively incorporated. We newly propose a SalCAR block so that perceptually important features can be extracted by using a saliency-based spatial attention and a channel attention. In addition, the visual saliency map is further used as a guideline for predicting the patch weight map in order to afford a stable training of end-to-end optimization for the JND-SalCAR. To our best knowledge, our work is the first HVS-inspired trainable IQA network that considers both the visual saliency and JND characteristics of HVS. We evaluate the proposed JND-SalCAR on large IQA datasets where it outperforms all the recent state-of-the-art IQA methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05316

PDF

http://arxiv.org/pdf/1902.05316
Read All
Semantic Referee: A Neural-Symbolic Framework for Enhancing Geospatial Semantic Segmentation

2019-04-30

Marjan Alirezaie, Martin Längkvist, Michael Sioutis, Amy Loutfi

arXiv_AI

arXiv_AI Knowledge Segmentation Semantic_Segmentation Relation
Abstract

Understanding why machine learning algorithms may fail is usually the task of the human expert that uses domain knowledge and contextual information to discover systematic shortcomings in either the data or the algorithm. In this paper, we propose a semantic referee, which is able to extract qualitative features of the errors emerging from deep machine learning frameworks and suggest corrections. The semantic referee relies on ontological reasoning about spatial knowledge in order to characterize errors in terms of their spatial relations with the environment. Using semantics, the reasoner interacts with the learning algorithm as a supervisor. In this paper, the proposed method of the interaction between a neural network classifier and a semantic referee shows how to improve the performance of semantic segmentation for satellite imagery data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13196

PDF

http://arxiv.org/pdf/1904.13196
Read All
Using cameras for precise measurement of two-dimensional plant features

2019-04-30

Amy Tabb, Germán A Holguín, Rachel Naegele

arXiv_CV

arXiv_CV
Abstract

Images are used frequently in plant phenotyping to capture measurements. This chapter offers a repeatable method for capturing two-dimensional measurements of plant parts in field or laboratory settings using a variety of camera styles (cellular phone, DSLR), with the addition of a printed calibration pattern. The method is based on calibrating the camera using information available from the EXIF tags from the image, as well as visual information from the pattern. Code is provided to implement the method, as well as a dataset for testing. We include steps to verify protocol correctness by imaging an artifact. The use of this protocol for two-dimensional plant phenotyping will allow data capture from different cameras and environments, with comparison on the same physical scale.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13187

PDF

http://arxiv.org/pdf/1904.13187
Read All
Fine-grained Entity Recognition with Reduced False Negatives and Large Type Coverage

2019-04-30

Abhishek Abhishek, Sanya Bathla Taneja, Garima Malik, Ashish Anand, Amit Awekar

arXiv_CL

arXiv_CL Detection Recognition
Abstract

Fine-grained Entity Recognition (FgER) is the task of detecting and classifying entity mentions to a large set of types spanning diverse domains such as biomedical, finance and sports. We observe that when the type set spans several domains, detection of entity mention becomes a limitation for supervised learning models. The primary reason being lack of dataset where entity boundaries are properly annotated while covering a large spectrum of entity types. Our work directly addresses this issue. We propose Heuristics Allied with Distant Supervision (HAnDS) framework to automatically construct a quality dataset suitable for the FgER task. HAnDS framework exploits the high interlink among Wikipedia and Freebase in a pipelined manner, reducing annotation errors introduced by naively using distant supervision approach. Using HAnDS framework, we create two datasets, one suitable for building FgER systems recognizing up to 118 entity types based on the FIGER type hierarchy and another for up to 1115 entity types based on the TypeNet hierarchy. Our extensive empirical experimentation warrants the quality of the generated datasets. Along with this, we also provide a manually annotated dataset for benchmarking FgER systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13178

PDF

http://arxiv.org/pdf/1904.13178
Read All
An Argumentation-Based Approach to Assist in the Investigation and Attribution of Cyber-Attacks

2019-04-30

Erisa Karafili, Linna Wang, Emil C. Lupu

arXiv_AI

arXiv_AI
Abstract

We expect an increase in frequency and severity of cyber-attacks that comes along with the need of efficient security countermeasures. The process of attributing a cyber-attack helps in constructing efficient and targeted mitigative and preventive security measures. In this work, we propose an argumentation-based reasoner (ABR) that helps the analyst during the analysis of forensic evidence and the attribution process. Given the evidence collected from the cyber-attack, our reasoner helps the analyst to identify who performed the attack and suggests the analyst where to focus further analyses by giving hints of the missing evidence, or further investigation paths to follow. ABR is the first automatic reasoner that analyzes and attributes cyber-attacks by using technical and social evidence, as well as incomplete and conflicting information. ABR was tested on realistic cyber-attacks cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13173

PDF

http://arxiv.org/pdf/1904.13173
Read All
Learning Restricted Regular Expressions with Interleaving

2019-04-30

Chunmei Dong, Yeting Li, Haiming Chen

arXiv_AI

arXiv_AI Inference
Abstract

The advantages for the presence of an XML schema for XML documents are numerous. However, many XML documents in practice are not accompanied by a schema or by a valid schema. Relax NG is a popular and powerful schema language, which supports the unconstrained interleaving operator. Focusing on the inference of Relax NG, we propose a new subclass of regular expressions with interleaving and design a polynomial inference algorithm. Then we conducted a series of experiments based on large-scale real data and on three XML data corpora, and experimental results show that our subclass has a better practicality than previous ones, and the regular expressions inferred by our algorithm are more precise.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13164

PDF

http://arxiv.org/pdf/1904.13164
Read All
Facial Expressions Analysis Under Occlusions Based on Specificities of Facial Motion Propagation

2019-04-30

Delphine Poux, Benjamin Allaert, Jose Mennesson, Nacim Ihaddadene, Ioan Marius Bilasco, Chaabane Djeraba

arXiv_CV

arXiv_CV
Abstract

Although much progress has been made in the facial expression analysis field, facial occlusions are still challenging. The main innovation brought by this contribution consists in exploiting the specificities of facial movement propagation for recognizing expressions in presence of important occlusions. The movement induced by an expression extends beyond the movement epicenter. Thus, the movement occurring in an occluded region propagates towards neighboring visible regions. In presence of occlusions, per expression, we compute the importance of each unoccluded facial region and we construct adapted facial frameworks that boost the performance of per expression binary classifier. The output of each expression-dependant binary classifier is then aggregated and fed into a fusion process that aims constructing, per occlusion, a unique model that recognizes all the facial expressions considered. The evaluations highlight the robustness of this approach in presence of significant facial occlusions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13154

PDF

http://arxiv.org/pdf/1904.13154
Read All
PR Product: A Substitute for Inner Product in Neural Networks

2019-04-30

Zhennan Wang, Wenbin Zou, Chen Xu

arXiv_CV

arXiv_CV Image_Caption Caption CNN Image_Classification RNN Classification Deep_Learning
Abstract

In this paper, we analyze the inner product of weight vector and input vector in neural networks from the perspective of vector orthogonal decomposition and prove that the local direction gradient of weight vector decreases as the angle between them gets closer to 0 or $\pi$. We propose the PR Product, a substitute for the inner product, which makes the local direction gradient of weight vector independent of the angle and consistently larger than the one in the conventional inner product while keeping the forward propagation identical. As the basic operation in neural networks, the PR Product can be applied into many existing deep learning modules, so we develop the PR Product version of the fully connected layer, convolutional layer, and LSTM layer. In static image classification, the experiments on CIFAR10 and CIFAR100 datasets demonstrate that the PR Product can robustly enhance the ability of various state-of-the-art classification networks. On the task of image captioning, even without any bells and whistles, our PR Product version of captioning model can compete or outperform the state-of-the-art models on MS COCO dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13148

PDF

http://arxiv.org/pdf/1904.13148
Read All
Incorporating Symbolic Sequential Modeling for Speech Enhancement

2019-04-30

Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai

arXiv_SD

arXiv_SD Knowledge Language_Model
Abstract

In a noisy environment, a lossy speech signal can be automatically restored by a listener if he/she knows the language well. That is, with the built-in knowledge of a “language model”, a listener may effectively suppress noise interference and retrieve the target speech signals. Accordingly, we argue that familiarity with the underlying linguistic content of spoken utterances benefits speech enhancement (SE) in noisy environments. In this study, in addition to the conventional modeling for learning the acoustic noisy-clean speech mapping, an abstract symbolic sequential modeling is incorporated into the SE framework. This symbolic sequential modeling can be regarded as a “linguistic constraint” in learning the acoustic noisy-clean speech mapping function. In this study, the symbolic sequences for acoustic signals are obtained as discrete representations with a Vector Quantized Variational Autoencoder algorithm. The obtained symbols are able to capture high-level phoneme-like content from speech signals. The experimental results demonstrate that the proposed framework can significantly improve the SE performance in terms of perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) on the TIMIT dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13142

PDF

http://arxiv.org/pdf/1904.13142
Read All
Surprising Effectiveness of Few-Image Unsupervised Feature Learning

2019-04-30

Yuki M. Asano, Christian Rupprecht, Andrea Vedaldi

arXiv_CV

arXiv_CV GAN CNN Represenation_Learning
Abstract

State-of-the-art methods for unsupervised representation learning can train well the first few layers of standard convolutional neural networks, but they are not as good as supervised learning for deeper layers. This is likely due to the generic and relatively simple nature of shallow layers; and yet, these approaches are applied to millions of images, scalability being advertised as their major advantage since unlabelled data is cheap to collect. In this paper we question this practice and ask whether so many images are actually needed to learn the layers for which unsupervised learning works best. Our main result is that a few or even a single image together with strong data augmentation are sufficient to nearly saturate performance. Specifically, we provide an analysis for three different self-supervised feature learning methods (BiGAN, RotNet, DeepCluster) vs number of training images (1, 10, 1000) and show that we can top the accuracy for the first two convolutional layers of common networks using just a single unlabelled training image and obtain competitive results for other layers. We further study and visualize the learned representation as a function of which (single) image is used for training. Our results are also suggestive of which type of information may be captured by shallow layers in deep networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13132

PDF

http://arxiv.org/pdf/1904.13132
Read All
Deep Spectral Clustering using Dual Autoencoder Network

2019-04-30

Xu Yang, Cheng Deng, Feng Zheng, Junchi Yan, Wei Liu

arXiv_CV

arXiv_CV Attention Embedding Relation
Abstract

The clustering methods have recently absorbed even-increasing attention in learning and vision. Deep clustering combines embedding and clustering together to obtain optimal embedding subspace for clustering, which can be more effective compared with conventional clustering methods. In this paper, we propose a joint learning framework for discriminative embedding and spectral clustering. We first devise a dual autoencoder network, which enforces the reconstruction constraint for the latent representations and their noisy versions, to embed the inputs into a latent space for clustering. As such the learned latent representations can be more robust to noise. Then the mutual information estimation is utilized to provide more discriminative information from the inputs. Furthermore, a deep spectral clustering method is applied to embed the latent representations into the eigenspace and subsequently clusters them, which can fully exploit the relationship between inputs to achieve optimal clustering results. Experimental results on benchmark datasets show that our method can significantly outperform state-of-the-art clustering approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13113

PDF

http://arxiv.org/pdf/1904.13113
Read All
The role of artificial intelligence in achieving the Sustainable Development Goals

2019-04-30

Ricardo Vinuesa, Hossein Azizpour, Iolanda Leite, Madeline Balaam, Virginia Dignum, Sami Domisch, Anna Felländer, Simone Langhans, Max Tegmark, Francesco Fuso Nerini

arXiv_AI

arXiv_AI
Abstract

The emergence of artificial intelligence (AI) and its progressively wider impact on many sectors across the society requires an assessment of its effect on sustainable development. Here we analyze published evidence of positive or negative impacts of AI on the achievement of each of the 17 goals and 169 targets of the 2030 Agenda for Sustainable Development. We find that AI can support the achievement of 128 targets across all SDGs, but it may also inhibit 58 targets. Notably, AI enables new technologies that improve efficiency and productivity, but it may also lead to increased inequalities among and within countries, thus hindering the achievement of the 2030 Agenda. The fast development of AI needs to be supported by appropriate policy and regulation. Otherwise, it would lead to gaps in transparency, accountability, safety and ethical standards of AI-based technology, which could be detrimental towards the development and sustainable use of AI. Finally, there is a lack of research assessing the medium- and long-term impacts of AI. It is therefore essential to reinforce the global debate regarding the use of AI and to develop the necessary regulatory insight and oversight for AI-based technologies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00501

PDF

http://arxiv.org/pdf/1905.00501
Read All
Deep Learning-based Face Pose Recovery

2019-04-30

Zhaoxiang Liu, Zezhou Chen, Jinqiang Bai, Shaohua Li, Shiguo Lian

arXiv_AI

arXiv_AI Attention Face Pose_Estimation CNN Deep_Learning
Abstract

Facial pose estimation has gained a lot of attentions in many practical applications, such as human-robot interaction, gaze estimation and driver monitoring. Meanwhile, end-to-end deep learning-based facial pose estimation is becoming more and more popular. However, facial pose estimation suffers from a key challenge: the lack of sufficient training data for many poses, especially for large poses. Inspired by the observation that the faces under close poses look similar, we reformulate the facial pose estimation as a label distribution learning problem, considering each face image as an example associated with a Gaussian label distribution rather than a single label, and construct a convolutional neural network which is trained with a multi-loss function on AFLW dataset and 300WLP dataset to predict the facial poses directly from color image. Extensive experiments are conducted on several popular benchmarks, including AFLW2000, BIWI, AFLW and AFW, where our approach shows a significant advantage over other state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13102

PDF

http://arxiv.org/pdf/1904.13102
Read All
Efficiently Checking Actual Causality with SAT Solving

2019-04-30

Amjad Ibrahim, Simon Rehwald, Alexander Pretschner

arXiv_AI

arXiv_AI Optimization
Abstract

Recent formal approaches towards causality have made the concept ready for incorporation into the technical world. However, causality reasoning is computationally hard; and no general algorithmic approach exists that efficiently infers the causes for effects. Thus, checking causality in the context of complex, multi-agent, and distributed socio-technical systems is a significant challenge. Therefore, we conceptualize an intelligent and novel algorithmic approach towards checking causality in acyclic causal models with binary variables, utilizing the optimization power in the solvers of the Boolean Satisfiability Problem (SAT). We present two SAT encodings, and an empirical evaluation of their efficiency and scalability. We show that causality is computed efficiently in less than 5 seconds for models that consist of more than 4000 variables.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13101

PDF

http://arxiv.org/pdf/1904.13101
Read All
The Responsibility Quantification Model of Human Interaction with Automation

2019-04-30

Nir Douer, Joachim Meyer

arXiv_AI

arXiv_AI
Abstract

Intelligent systems and advanced automation are involved in information collection and evaluation, in decision-making and in the implementation of chosen actions. In such systems, human responsibility becomes equivocal. Understanding human responsibility is particularly important when intelligent autonomous systems can harm people, as with autonomous vehicles or, most notably, with Advanced Weapon Systems (AWS). Using Information Theory, we develop a responsibility quantification (ResQu) model of human involvement in intelligent automated systems and demonstrate its applications on decisions regarding AWS. The analysis reveals that human comparative responsibility is often low, even when major functions are allocated to the human. Thus, broadly stated policies of keeping humans in the loop and having meaningful human control are misleading and cannot truly direct decisions on how to involve humans in intelligent systems and advanced automation. Our responsibility model can guide system design decisions and can aid policy and legal decisions regarding human responsibility in intelligent systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.12644

PDF

http://arxiv.org/pdf/1810.12644
Read All
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

2019-04-30

Yuying Zhu, Guoxin Wang, Börje F. Karlsson

arXiv_CL

arXiv_CL Segmentation Attention Embedding CNN Recognition
Abstract

Named entity recognition (NER) in Chinese is essential but difficult because of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS) is usually considered as the first step for Chinese NER. However, models based on word-level embeddings and lexicon features often suffer from segmentation errors and out-of-vocabulary (OOV) words. In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated recurrent unit (GRU) with global self-attention layer to capture the information from adjacent characters and sentence contexts. Also, compared to other models, not depending on any external resources like lexicons and employing small size of char embeddings make our model more practical. Extensive experimental results show that our approach outperforms state-of-the-art methods without word embedding and external lexicon resources on different domain datasets including Weibo, MSRA and Chinese Resume NER dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02141

PDF

http://arxiv.org/pdf/1904.02141
Read All
Early Action Prediction with Generative Adversarial Networks

2019-04-30

Dong Wang, Yuan Yuan, Qi Wang

arXiv_CV

arXiv_CV Adversarial GAN RNN Prediction Recognition
Abstract

Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this work, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. Extensive experimental results on UCF101, BIT and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13085

PDF

http://arxiv.org/pdf/1904.13085
Read All
Appearance and Pose-Conditioned Human Image Generation using Deformable GANs

2019-04-30

Aliaksandr Siarohin, Stéphane Lathuilière, Enver Sangineto, Nicu Sebe

arXiv_CV

arXiv_CV Re-identification Adversarial GAN Person_Re-identification Quantitative
Abstract

In this paper, we address the problem of generating person images conditioned on both pose and appearance information. Specifically, given an image xa of a person and a target pose P(xb), extracted from a different image xb, we synthesize a new image of that person in pose P(xb), while preserving the visual details in xa. In order to deal with pixel-to-pixel misalignments caused by the pose differences between P(xa) and P(xb), we introduce deformable skip connections in the generator of our Generative Adversarial Network. Moreover, a nearest-neighbour loss is proposed instead of the common L1 and L2 losses in order to match the details of the generated image with the target image. Quantitative and qualitative results, using common datasets and protocols recently proposed for this task, show that our approach is competitive with respect to the state of the art. Moreover, we conduct an extensive evaluation using off-the-shell person re-identification (Re-ID) systems trained with person-generation based augmented data, which is one of the main important applications for this task. Our experiments show that our Deformable GANs can significantly boost the Re-ID accuracy and are even better than data-augmentation methods specifically trained using Re-ID losses.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00007

PDF

http://arxiv.org/pdf/1905.00007
Read All
Memory-Augmented Temporal Dynamic Learning for Action Recognition

2019-04-30

Yuan Yuan, Dong Wang, Qi Wang

arXiv_CV

arXiv_CV Action_Recognition Embedding CNN RNN Recognition
Abstract

Human actions captured in video sequences contain two crucial factors for action recognition, i.e., visual appearance and motion dynamics. To model these two aspects, Convolutional and Recurrent Neural Networks (CNNs and RNNs) are adopted in most existing successful methods for recognizing actions. However, CNN based methods are limited in modeling long-term motion dynamics. RNNs are able to learn temporal motion dynamics but lack effective ways to tackle unsteady dynamics in long-duration motion. In this work, we propose a memory-augmented temporal dynamic learning network, which learns to write the most evident information into an external memory module and ignore irrelevant ones. In particular, we present a differential memory controller to make a discrete decision on whether the external memory module should be updated with current feature. The discrete memory controller takes in the memory history, context embedding and current feature as inputs and controls information flow into the external memory module. Additionally, we train this discrete memory controller using straight-through estimator. We evaluate this end-to-end system on benchmark datasets (UCF101 and HMDB51) of human action recognition. The experimental results show consistent improvements on both datasets over prior works and our baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13080

PDF

http://arxiv.org/pdf/1904.13080
Read All
ABC: A Big CAD Model Dataset For Geometric Deep Learning

2019-04-30

Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, Daniele Panozzo

arXiv_CV

arXiv_CV Segmentation Face Deep_Learning Detection
Abstract

We introduce ABC-Dataset, a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications. Each model is a collection of explicitly parametrized curves and surfaces, providing ground truth for differential quantities, patch segmentation, geometric feature detection, and shape reconstruction. Sampling the parametric descriptions of surfaces and curves allows generating data in different formats and resolutions, enabling fair comparisons for a wide range of geometric learning algorithms. As a use case for our dataset, we perform a large-scale benchmark for estimation of surface normals, comparing existing data driven methods and evaluating their performance against both the ground truth and traditional normal estimation methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.06216

PDF

http://arxiv.org/pdf/1812.06216
Read All
Anomaly Detection in Traffic Scenes via Spatial-aware Motion Reconstruction

2019-04-30

Yuan Yuan, Dong Wang, Qi Wang

arXiv_CV

arXiv_CV Sparse Detection
Abstract

Anomaly detection from a driver’s perspective when driving is important to autonomous vehicles. As a part of Advanced Driver Assistance Systems (ADAS), it can remind the driver about dangers timely. Compared with traditional studied scenes such as the university campus and market surveillance videos, it is difficult to detect abnormal event from a driver’s perspective due to camera waggle, abidingly moving background, drastic change of vehicle velocity, etc. To tackle these specific problems, this paper proposes a spatial localization constrained sparse coding approach for anomaly detection in traffic scenes, which firstly measures the abnormality of motion orientation and magnitude respectively and then fuses these two aspects to obtain a robust detection result. The main contributions are threefold: 1) This work describes the motion orientation and magnitude of the object respectively in a new way, which is demonstrated to be better than the traditional motion descriptors. 2) The spatial localization of object is taken into account of the sparse reconstruction framework, which utilizes the scene’s structural information and outperforms the conventional sparse coding methods. 3) Results of motion orientation and magnitude are adaptively weighted and fused by a Bayesian model, which makes the proposed method more robust and handle more kinds of abnormal events. The efficiency and effectiveness of the proposed method are validated by testing on nine difficult video sequences captured by ourselves. Observed from the experimental results, the proposed method is more effective and efficient than the popular competitors, and yields a higher performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13079

PDF

http://arxiv.org/pdf/1904.13079
Read All
Interpretation of Feature Space using Multi-Channel Attentional Sub-Networks

2019-04-30

Masanari Kimura, Masayuki Tanaka

arXiv_AI

arXiv_AI Attention CNN Recognition
Abstract

Convolutional Neural Networks have achieved impressive results in various tasks, but interpreting the internal mechanism is a challenging problem. To tackle this problem, we exploit a multi-channel attention mechanism in feature space. Our network architecture allows us to obtain an attention mask for each feature while existing CNN visualization methods provide only a common attention mask for all features. We apply the proposed multi-channel attention mechanism to multi-attribute recognition task. We can obtain different attention mask for each feature and for each attribute. Those analyses give us deeper insight into the feature space of CNNs. The experimental results for the benchmark dataset show that the proposed method gives high interpretability to humans while accurately grasping the attributes of the data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13078

PDF

http://arxiv.org/pdf/1904.13078
Read All
A Data Dependent Multiscale Model for Hyperspectral Unmixing With Spectral Variability

2019-04-30

Ricardo Augusto Borsoi, Tales Imbiriba, José Carlos Moreira Bermudez

arXiv_CV

arXiv_CV Regularization Optimization
Abstract

Spectral variability in hyperspectral images can result from factors including environmental, illumination, atmospheric and temporal changes. Its occurrence may lead to the propagation of significant estimation errors in the unmixing process. To address this issue, extended linear mixing models have been proposed which lead to large scale nonsmooth ill-posed inverse problems. Furthermore, the regularization strategies used to obtain meaningful results have introduced interdependencies among abundance solutions that further increase the complexity of the resulting optimization problem. In this paper we present a novel data dependent multiscale model for hyperspectral unmixing accounting for spectral variability. The new method incorporates spatial contextual information to the abundances in the Extended Linear Mixing Model by using a multiscale transform based on superpixels. The proposed method results in a fast algorithm that solves the abundance problem only once in each scale during each iteration. Simulation results using synthetic and real images compare the performances, both in accuracy and execution time, of the proposed algorithm and other state-of-the-art solutions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.01047

PDF

http://arxiv.org/pdf/1808.01047
Read All
Cooperative Localization under Limited Connectivity

2019-04-30

Jianan Zhu, Solmaz S. Kia

arXiv_RO

arXiv_RO Optimization Relation
Abstract

We report two novel decentralized multi-agent cooperative localization algorithms in which, to reduce the communication cost, inter-agent state estimate correlations are not maintained but accounted for implicitly. In our first algorithm, to guarantee filter consistency, we account for unknown inter-agent correlations via an upper bound on the joint covariance matrix of the agents. In the second method, we use an optimization framework to estimate the unknown inter-agent cross-covariance matrix. In our algorithms, each agent localizes itself in a global coordinate frame using a local filter driven by local dead reckoning and occasional absolute measurement updates and opportunistically corrects its pose estimate whenever a relative measurement takes place between this agent and another mobile agent. To process that relative measurement, only those two agents need to communicate with each other. Consequently, our algorithms are decentralized algorithms that do not impose restrictive network-wide connectivity condition. Moreover, we make no assumptions about the type of agents or relative measurements. We demonstrate our algorithms in simulation and a robotic experiment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13074

PDF

http://arxiv.org/pdf/1904.13074
Read All
SurfelWarp: Efficient Non-Volumetric Single View Dynamic Reconstruction

2019-04-30

Wei Gao, Russ Tedrake

arXiv_CV

arXiv_CV Tracking SLAM
Abstract

We contribute a dense SLAM system that takes a live stream of depth images as input and reconstructs non-rigid deforming scenes in real time, without templates or prior models. In contrast to existing approaches, we do not maintain any volumetric data structures, such as truncated signed distance function (TSDF) fields or deformation fields, which are performance and memory intensive. Our system works with a flat point (surfel) based representation of geometry, which can be directly acquired from commodity depth sensors. Standard graphics pipelines and general purpose GPU (GPGPU) computing are leveraged for all central operations: i.e., nearest neighbor maintenance, non-rigid deformation field estimation and fusion of depth measurements. Our pipeline inherently avoids expensive volumetric operations such as marching cubes, volumetric fusion and dense deformation field update, leading to significantly improved performance. Furthermore, the explicit and flexible surfel based geometry representation enables efficient tackling of topology changes and tracking failures, which makes our reconstructions consistent with updated depth observations. Our system allows robots to maintain a scene description with non-rigidly deformed objects that potentially enables interactions with dynamic working environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13073

PDF

http://arxiv.org/pdf/1904.13073
Read All
Cross-Modal Message Passing for Two-stream Fusion

2019-04-30

Dong Wang, Yuan Yuan, Qi Wang

arXiv_CV

arXiv_CV Action_Recognition Classification Quantitative Recognition
Abstract

Processing and fusing information among multi-modal is a very useful technique for achieving high performance in many computer vision problems. In order to tackle multi-modal information more effectively, we introduce a novel framework for multi-modal fusion: Cross-modal Message Passing (CMMP). Specifically, we propose a cross-modal message passing mechanism to fuse two-stream network for action recognition, which composes of an appearance modal network (RGB image) and a motion modal (optical flow image) network. The objectives of individual networks in this framework are two-fold: a standard classification objective and a competing objective. The classification object ensures that each modal network predicts the true action category while the competing objective encourages each modal network to outperform the other one. We quantitatively show that the proposed CMMP fuses the traditional two-stream network more effectively, and outperforms all existing two-stream fusion method on UCF-101 and HMDB-51 datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13072

PDF

http://arxiv.org/pdf/1904.13072
Read All

48/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL