Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Interpretable BoW Networks for Adversarial Example Detection

2019-01-08

Krishna Kanth Nakka, Mathieu Salzmann

arXiv_CV

arXiv_CV Adversarial CNN Prediction Detection
Abstract

The standard approach to providing interpretability to deep convolutional neural networks (CNNs) consists of visualizing either their feature maps, or the image regions that contribute the most to the prediction. In this paper, we introduce an alternative strategy to interpret the results of a CNN. To this end, we leverage a Bag of visual Word representation within the network and associate a visual and semantic meaning to the corresponding codebook elements via the use of a generative adversarial network. The reason behind the prediction for a new sample can then be interpreted by looking at the visual representation of the most highly activated codeword. We then propose to exploit our interpretable BoW networks for adversarial example detection. To this end, we build upon the intuition that, while adversarial samples look very similar to real images, to produce incorrect predictions, they should activate codewords with a significantly different visual representation. We therefore cast the adversarial example detection problem as that of comparing the input image with the most highly activated visual codeword. As evidenced by our experiments, this allows us to outperform the state-of-the-art adversarial example detection methods on standard benchmarks, independently of the attack strategy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02229

PDF

http://arxiv.org/pdf/1901.02229
Read All
Multi-turn Inference Matching Network for Natural Language Inference

2019-01-08

Chunhua Liu, Shan Jiang, Hainan Yu, Dong Yu

arXiv_CL

arXiv_CL Inference
Abstract

Natural Language Inference (NLI) is a fundamental and challenging task in Natural Language Processing (NLP). Most existing methods only apply one-pass inference process on a mixed matching feature, which is a concatenation of different matching features between a premise and a hypothesis. In this paper, we propose a new model called Multi-turn Inference Matching Network (MIMN) to perform multi-turn inference on different matching features. In each turn, the model focuses on one particular matching feature instead of the mixed matching feature. To enhance the interaction between different matching features, a memory component is employed to store the history inference information. The inference of each turn is performed on the current matching feature and the memory. We conduct experiments on three different NLI datasets. The experimental results show that our model outperforms or achieves the state-of-the-art performance on all the three datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02222

PDF

http://arxiv.org/pdf/1901.02222
Read All
Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning

2019-01-08

Andreas Sedlmeier, Thomas Gabor, Thomy Phan, Lenz Belzner, Claudia Linnhoff-Popien

arXiv_AI

arXiv_AI Reinforcement_Learning Inference Detection
Abstract

We consider the problem of detecting out-of-distribution (OOD) samples in deep reinforcement learning. In a value based reinforcement learning setting, we propose to use uncertainty estimation techniques directly on the agent’s value estimating neural network to detect OOD samples. The focus of our work lies in analyzing the suitability of approximate Bayesian inference methods and related ensembling techniques that generate uncertainty estimates. Although prior work has shown that dropout-based variational inference techniques and bootstrap-based approaches can be used to model epistemic uncertainty, the suitability for detecting OOD samples in deep reinforcement learning remains an open question. Our results show that uncertainty estimation can be used to differentiate in- from out-of-distribution samples. Over the complete training process of the reinforcement learning agents, bootstrap-based approaches tend to produce more reliable epistemic uncertainty estimates, when compared to dropout-based approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02219

PDF

http://arxiv.org/pdf/1901.02219
Read All
FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals

2019-01-08

Umur Aybars Ciftci, Ilke Demir

arXiv_CV

arXiv_CV Object_Detection Face Deep_Learning Detection
Abstract

As we enter into the AI era, the proliferation of deep learning approaches, especially generative models, passed beyond research communities as it is being utilized for both good and bad intentions of the society. While generative models get stronger by creating more representative replicas, this strength begins to pose a threat on information integrity. We would like to present an approach to detect synthesized content in the domain of portrait videos, as a preventive solution for this threat. In other words, we would like to build a deep fake detector. Our approach exploits biological signals extracted from facial areas based on the observation that these signals are not well-preserved spatially and temporally in synthetic content. First, we exhibit several unary and binary signal transformations for the pairwise separation problem, achieving 99.39% accuracy to detect fake portrait videos. Second, we use those findings to formulate a generalized classifier of authentic and fake content, by analyzing the characteristics of proposed signal transformations and their corresponding feature sets. We evaluated FakeCatcher both on Face Forensics dataset [46] and on our newly introduced Deep Fakes dataset, performing with 82.55% and 77.33% accuracy respectively. Third, we are also releasing this mixed dataset of synthesized videos that we collected as a part of our evaluation process, containing fake portrait videos “in the wild”, independent of a specific generative model, independent of the video compression, and independent of the context. We also analyzed the effects of different facial regions, video segment durations, and dimensionality reduction techniques and compared our detection rate to recent approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02212

PDF

http://arxiv.org/pdf/1901.02212
Read All
Translating SAR to Optical Images for Assisted Interpretation

2019-01-08

Shilei Fu, Feng Xu, Ya-Qiu Jin

arXiv_CV

arXiv_CV GAN
Abstract

Despite the advantages of all-weather and all-day high-resolution imaging, SAR remote sensing images are much less viewed and used by general people because human vision is not adapted to microwave scattering phenomenon. However, expert interpreters can be trained by compare side-by-side SAR and optical images to learn the translation rules from SAR to optical. This paper attempts to develop machine intelligence that are trainable with large-volume co-registered SAR and optical images to translate SAR image to optical version for assisted SAR interpretation. A novel reciprocal GAN scheme is proposed for this translation task. It is trained and tested on both spaceborne GF-3 and airborne UAVSAR images. Comparisons and analyses are presented for datasets of different resolutions and polarizations. Results show that the proposed translation network works well under many scenarios and it could potentially be used for assisted SAR interpretation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.03749

PDF

http://arxiv.org/pdf/1901.03749
Read All
SCube: A Tool for Segregation Discovery

2019-01-08

Alessandro Baroni, Salvatore Ruggieri

arXiv_CV

arXiv_CV Relation
Abstract

Segregation is the separation of social groups in the physical or in the online world. Segregation discovery consists of finding contexts of segregation. In the modern digital society, discovering segregation is challenging, due to the large amount and the variety of social data. We present a tool in support of segregation discovery from relational and graph data. The SCube system builds on attributed graph clustering and frequent itemset mining. It offers to the analyst a multi-dimensional segregation data cube for exploratory data analysis. The demonstration first guides the audience through the relevant social science concepts. Then, it focuses on scenarios around case studies of gender occupational segregation. Two real and large datasets about the boards of directors of Italian and Estonian companies will be explored in search of segregation contexts. The architecture of the SCube system and its computational efficiency challenges and solutions are discussed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.08348

PDF

https://arxiv.org/pdf/1709.08348
Read All
FIGR: Few-shot Image Generation with Reptile

2019-01-08

Louis Clouâtre, Marc Demers

arXiv_CV

arXiv_CV Adversarial GAN Deep_Learning
Abstract

Generative Adversarial Networks (GAN) boast impressive capacity to generate realistic images. However, like much of the field of deep learning, they require an inordinate amount of data to produce results, thereby limiting their usefulness in generating novelty. In the same vein, recent advances in meta-learning have opened the door to many few-shot learning applications. In the present work, we propose Few-shot Image Generation using Reptile (FIGR), a GAN meta-trained with Reptile. Our model successfully generates novel images on both MNIST and Omniglot with as little as 4 images from an unseen class. We further contribute FIGR-8, a new dataset for few-shot image generation, which contains 1,548,944 icons categorized in over 18,409 classes. Trained on FIGR-8, initial results show that our model can generalize to more advanced concepts (such as “bird” and “knife”) from as few as 8 samples from a previously unseen class of images and as little as 10 training steps through those 8 images. This work demonstrates the potential of training a GAN for few-shot image generation and aims to set a new benchmark for future work in the domain.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02199

PDF

http://arxiv.org/pdf/1901.02199
Read All
Explaining AlphaGo: Interpreting Contextual Effects in Neural Networks

2019-01-08

Zenan Ling, Haotian Ma, Yu Yang, Robert C. Qiu, Song-Chun Zhu, Quanshi Zhang

arXiv_CV

arXiv_CV Inference
Abstract

In this paper, we propose to disentangle and interpret contextual effects that are encoded in a pre-trained deep neural network. We use our method to explain the gaming strategy of the alphaGo Zero model. Unlike previous studies that visualized image appearances corresponding to the network output or a neural activation only from a global perspective, our research aims to clarify how a certain input unit (dimension) collaborates with other units (dimensions) to constitute inference patterns of the neural network and thus contribute to the network output. The analysis of local contextual effects w.r.t. certain input units is of special values in real applications. Explaining the logic of the alphaGo Zero model is a typical application. In experiments, our method successfully disentangled the rationale of each move during the Go game.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02184

PDF

http://arxiv.org/pdf/1901.02184
Read All
Deconstructing Word Embeddings

2019-01-08

Koushik Varma Kalidindi

arXiv_CL

arXiv_CL Review Embedding Relation
Abstract

A review of Word Embedding Models through a deconstructive approach reveals their several shortcomings and inconsistencies. These include instability of the vector representations, a distorted analogical reasoning, geometric incompatibility with linguistic features, and the inconsistencies in the corpus data. A new theoretical embedding model, Derridian Embedding, is proposed in this paper. Contemporary embedding models are evaluated qualitatively in terms of how adequate they are in relation to the capabilities of a Derridian Embedding.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.00551

PDF

http://arxiv.org/pdf/1902.00551
Read All
Ensembles of feedforward-designed convolutional neural networks

2019-01-08

Yueru Chen, Yijing Yang, Wei Wang, C.-C. Jay Kuo

arXiv_CV

arXiv_CV Embedding CNN Image_Classification Classification
Abstract

An ensemble method that fuses the output decision vectors of multiple feedforward-designed convolutional neural networks (FF-CNNs) to solve the image classification problem is proposed in this work. To enhance the performance of the ensemble system, it is critical to increasing the diversity of FF-CNN models. To achieve this objective, we introduce diversities by adopting three strategies: 1) different parameter settings in convolutional layers, 2) flexible feature subsets fed into the Fully-connected (FC) layers, and 3) multiple image embeddings of the same input source. Furthermore, we partition input samples into easy and hard ones based on their decision confidence scores. As a result, we can develop a new ensemble system tailored to hard samples to further boost classification accuracy. Experiments are conducted on the MNIST and CIFAR-10 datasets to demonstrate the effectiveness of the ensemble method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02154

PDF

http://arxiv.org/pdf/1901.02154
Read All
Audio Captcha Recognition Using RastaPLP Features by SVM

2019-01-08

Ahmet Faruk Cakmak, Muhammet Balcilar

arXiv_SD

arXiv_SD Recognition
Abstract

Nowadays, CAPTCHAs are computer generated tests that human can pass but current computer systems can not. They have common usage in various web services in order to be able to detect a human from computer programs autonomously. In this way, owners can protect their web services from bots. In addition to visual CAPTCHAs which consist of distorted images, mostly test images, that a user must write some description about that image, there are a significant amount of audio CAPTCHAs as well. Briefly, audio CAPTCHAs are sound files which consist of human sound under heavy noise where the speaker pronounces a bunch of digits consecutively. Generally, in those sound files, there are some periodic and non-periodic noises to get difficult to recognize them with a program but not for a human listener. We gathered numerous randomly collected audio file to train and then test them using our SVM algorithm to be able to extract digits out of each conversation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02153

PDF

http://arxiv.org/pdf/1901.02153
Read All
Spatial-Winograd Pruning Enabling Sparse Winograd Convolution

2019-01-08

Jiecao Yu, Jongsoo Park, Maxim Naumov

arXiv_CV

arXiv_CV Sparse CNN
Abstract

Deep convolutional neural networks (CNNs) are deployed in various applications but demand immense computational requirements. Pruning techniques and Winograd convolution are two typical methods to reduce the CNN computation. However, they cannot be directly combined because Winograd transformation fills in the sparsity resulting from pruning. Li et al. (2017) propose sparse Winograd convolution in which weights are directly pruned in the Winograd domain, but this technique is not very practical because Winograd-domain retraining requires low learning rates and hence significantly longer training time. Besides, Liu et al. (2018) move the ReLU function into the Winograd domain, which can help increase the weight sparsity but requires changes in the network structure. To achieve a high Winograd-domain weight sparsity without changing network structures, we propose a new pruning method, spatial-Winograd pruning. As the first step, spatial-domain weights are pruned in a structured way, which efficiently transfers the spatial-domain sparsity into the Winograd domain and avoids Winograd-domain retraining. For the next step, we also perform pruning and retraining directly in the Winograd domain but propose to use an importance factor matrix to adjust weight importance and weight gradients. This adjustment makes it possible to effectively retrain the pruned Winograd-domain network without changing the network structure. For the three models on the datasets of CIFAR10, CIFAR-100, and ImageNet, our proposed method can achieve the Winograd domain sparsities of 63%, 50%, and 74%, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02132

PDF

http://arxiv.org/pdf/1901.02132
Read All
Mining Deep And-Or Object Structures via Cost-Sensitive Question-Answer-Based Active Annotations

2019-01-08

Quanshi Zhang, Ying Nian Wu, Hao Zhang, Song-Chun Zhu

arXiv_CV

arXiv_CV QA
Abstract

This paper presents a cost-sensitive active Question-Answering (QA) framework for learning a nine-layer And-Or graph (AOG) from web images. The AOG explicitly represents object categories, poses/viewpoints, parts, and detailed structures within the parts in a compositional hierarchy. The QA framework is designed to minimize an overall risk, which trades off the loss and query costs. The loss is defined for nodes in all layers of the AOG, including the generative loss (measuring the likelihood of the images) and the discriminative loss (measuring the fitness to human answers). The cost comprises both the human labor of answering questions and the computational cost of model learning. The cost-sensitive QA framework iteratively selects different storylines of questions to update different nodes in the AOG. Experiments showed that our method required much less human supervision (e.g., labeling parts on 3–10 training objects for each category) and achieved better performance than baseline methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1708.03911

PDF

http://arxiv.org/pdf/1708.03911
Read All
Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

2019-01-08

Mikael Henaff, Alfredo Canziani, Yann LeCun

arXiv_AI

arXiv_AI Regularization Prediction
Abstract

Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. We propose to train a policy by unrolling a learned model of the environment dynamics over multiple time steps while explicitly penalizing two costs: the original cost the policy seeks to optimize, and an uncertainty cost which represents its divergence from the states it is trained on. We measure this second cost by using the uncertainty of the dynamics model about its own predictions, using recent ideas from uncertainty estimation for deep networks. We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02705

PDF

http://arxiv.org/pdf/1901.02705
Read All
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

2019-01-08

Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Greg Yang, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

arXiv_AI

arXiv_AI CNN Prediction Gradient_Descent
Abstract

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance in finite-channel CNNs trained with stochastic gradient descent (SGD) has no corresponding property in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.05148

PDF

http://arxiv.org/pdf/1810.05148
Read All
Dynamics are Important for the Recognition of Equine Pain in Video

2019-01-07

Sofia Broomé, Karina Bech Gleerup, Pia Haubro Andersen, Hedvig Kjellström

arXiv_CV

arXiv_CV Classification Detection Recognition
Abstract

A prerequisite to successfully alleviate pain in animals is to recognize it, which is a great challenge in non-verbal species. Furthermore, prey animals such as horses tend to hide their pain. In this study, we propose a deep recurrent two-stream architecture for the task of distinguishing pain from non-pain in videos of horses. Different models are evaluated on a unique dataset showing horses under controlled trials with moderate pain induction, which has been presented in earlier work. Sequential models are experimentally compared to single-frame models, showing the importance of the temporal dimension of the data, and are benchmarked against a veterinary expert classification of the data. We additionally perform baseline comparisons with generalized versions of state-of-the-art human pain recognition methods. While equine pain detection in machine learning is a novel field, our results surpass veterinary expert performance and outperform pain detection results reported for other larger non-human species.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02106

PDF

http://arxiv.org/pdf/1901.02106
Read All
On the effect of the activation function on the distribution of hidden nodes in a deep network

2019-01-07

Philip M. Long, Hanie Sedghi

arXiv_AI

arXiv_AI
Abstract

We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in ${ -1, 1}^N$. We show that, if the activation function $\phi$ satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the `length process’ converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases, and the activation function $\phi$. We also show that this convergence may fail for $\phi$ that violate our assumptions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02104

PDF

http://arxiv.org/pdf/1901.02104
Read All
On the Dimensionality of Embeddings for Sparse Features and Data

2019-01-07

Maxim Naumov

arXiv_CV

arXiv_CV Sparse Embedding
Abstract

In this note we discuss a common misconception, namely that embeddings are always used to reduce the dimensionality of the item space. We show that when we measure dimensionality in terms of information entropy then the embedding of sparse probability distributions, that can be used to represent sparse features or data, may or not reduce the dimensionality of the item space. However, the embeddings do provide a different and often more meaningful representation of the items for a particular task at hand. Also, we give upper bounds and more precise guidelines for choosing the embedding dimension.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02103

PDF

http://arxiv.org/pdf/1901.02103
Read All
Team EP at TAC 2018: Automating data extraction in systematic reviews of environmental agents

2019-01-07

Artur Nowak, Paweł Kunstman

arXiv_CL

arXiv_CL Regularization Review Embedding RNN Deep_Learning
Abstract

We describe our entry for the Systematic Review Information Extraction track of the 2018 Text Analysis Conference. Our solution is an end-to-end, deep learning, sequence tagging model based on the BI-LSTM-CRF architecture. However, we use interleaved, alternating LSTM layers with highway connections instead of the more traditional approach, where last hidden states of both directions are concatenated to create an input to the next layer. We also make extensive use of pre-trained word embeddings, namely GloVe and ELMo. Thanks to a number of regularization techniques, we were able to achieve relatively large capacity of the model (31.3M+ of trainable parameters) for the size of training set (100 documents, less than 200K tokens). The system’s official score was 60.9% (micro-F1) and it ranked first for the Task 1. Additionally, after rectifying an obvious mistake in the submission format, the system scored 67.35%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02081

PDF

http://arxiv.org/pdf/1901.02081
Read All
All Graphs Lead to Rome: Learning Geometric and Cycle-Consistent Representations with Graph Convolutional Networks

2019-01-07

Stephen Phillips, Kostas Daniilidis

arXiv_CV

arXiv_CV Knowledge Embedding CNN Optimization
Abstract

Image feature matching is a fundamental part of many geometric computer vision applications, and using multiple images can improve performance. In this work, we formulate multi-image matching as a graph embedding problem then use a Graph Convolutional Network to learn an appropriate embedding function for aligning image features. We use cycle consistency to train our network in an unsupervised fashion, since ground truth correspondence is difficult or expensive to aquire. In addition, geometric consistency losses can be added at training time, even if the information is not available in the test set, unlike previous approaches that optimize cycle consistency directly. To the best of our knowledge, no other works have used learning for multi-image feature matching. Our experiments show that our method is competitive with other optimization based approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02078

PDF

http://arxiv.org/pdf/1901.02078
Read All
Specification Patterns for Robotic Missions

2019-01-07

Claudio Menghi, Christos Tsigkanos, Patrizio Pelliccione, Carlo Ghezzi, Thorsten Berger

arXiv_RO

arXiv_RO Relation
Abstract

Mobile and general-purpose robots increasingly support our everyday life, requiring dependable robotics control software. Creating such software mainly amounts to implementing their complex behaviors known as missions. Recognizing the need, a large number of domain-specific specification languages has been proposed. These, in addition to traditional logical languages, allow the use of formally specified missions for synthesis, verification, simulation, or guiding the implementation. For instance, the logical language LTL is commonly used by experts to specify missions, as an input for planners, which synthesize the behavior a robot should have. Unfortunately, domain-specific languages are usually tied to specific robot models, while logical languages such as LTL are difficult to use by non-experts. We present a catalog of 22 mission specification patterns for mobile robots, together with tooling for instantiating, composing, and compiling the patterns to create mission specifications. The patterns provide solutions for recurrent specification problems, each of which detailing the usage intent, known uses, relationships to other patterns, and—most importantly—a template mission specification in temporal logic. Our tooling produces specifications expressed in the LTL and CTL temporal logics to be used by planners, simulators, or model checkers. The patterns originate from 245 realistic textual mission requirements extracted from the robotics literature, and they are evaluated upon a total of 441 real-world mission requirements and 1251 mission specifications. Five of these reflect scenarios we defined with two well-known industrial partners developing human-size robots. We validated our patterns’ correctness with simulators and two real robots.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02077

PDF

http://arxiv.org/pdf/1901.02077
Read All
Convolutional Neural Networks on non-uniform geometrical signals using Euclidean spectral transformation

2019-01-07

Chiyu "Max" Jiang, Dequan Wang, Jingwei Huang, Philip Marcus, Matthias Nießner

arXiv_AI

arXiv_AI Face CNN
Abstract

Convolutional Neural Networks (CNN) have been successful in processing data signals that are uniformly sampled in the spatial domain (e.g., images). However, most data signals do not natively exist on a grid, and in the process of being sampled onto a uniform physical grid suffer significant aliasing error and information loss. Moreover, signals can exist in different topological structures as, for example, points, lines, surfaces and volumes. It has been challenging to analyze signals with mixed topologies (for example, point cloud with surface mesh). To this end, we develop mathematical formulations for Non-Uniform Fourier Transforms (NUFT) to directly, and optimally, sample nonuniform data signals of different topologies defined on a simplex mesh into the spectral domain with no spatial sampling error. The spectral transform is performed in the Euclidean space, which removes the translation ambiguity from works on the graph spectrum. Our representation has four distinct advantages: (1) the process causes no spatial sampling error during the initial sampling, (2) the generality of this approach provides a unified framework for using CNNs to analyze signals of mixed topologies, (3) it allows us to leverage state-of-the-art backbone CNN architectures for effective learning without having to design a particular architecture for a particular data structure in an ad-hoc fashion, and (4) the representation allows weighted meshes where each element has a different weight (i.e., texture) indicating local properties. We achieve results on par with the state-of-the-art for the 3D shape retrieval task, and a new state-of-the-art for the point cloud to surface reconstruction task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02070

PDF

http://arxiv.org/pdf/1901.02070
Read All
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array

2019-01-07

Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen

arXiv_CV

arXiv_CV GAN Face Optimization Inference Deep_Learning
Abstract

With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used in many domains. To achieve high performance and energy efficiency, hardware acceleration (especially inference) of DNNs is intensively studied both in academia and industry. However, we still face two challenges: large DNN models and datasets, which incur frequent off-chip memory accesses; and the training of DNNs, which is not well-explored in recent accelerator designs. To truly provide high throughput and energy efficient acceleration for the training of deep and large models, we inevitably need to use multiple accelerators to explore the coarse-grain parallelism, compared to the fine-grain parallelism inside a layer considered in most of the existing architectures. It poses the key research question to seek the best organization of computation and dataflow among accelerators. In this paper, we propose a solution HyPar to determine layer-wise parallelism for deep neural network training with an array of DNN accelerators. HyPar partitions the feature map tensors (input and output), the kernel tensors, the gradient tensors, and the error tensors for the DNN accelerators. A partition constitutes the choice of parallelism for weighted layers. The optimization target is to search a partition that minimizes the total communication during training a complete DNN. To solve this problem, we propose a communication model to explain the source and amount of communications. Then, we use a hierarchical layer-wise dynamic programming method to search for the partition for each layer.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02067

PDF

https://arxiv.org/pdf/1901.02067
Read All
Sinusoidal wave generating network based on adversarial learning and its application: synthesizing frog sounds for data augmentation

2019-01-07

Sangwook Park, David K. Han, Hanseok Ko

arXiv_SD

arXiv_SD Adversarial CNN Classification Prediction Quantitative
Abstract

Simulators that generate observations based on theoretical models can be important tools for development, prediction, and assessment of signal processing algorithms. In order to design these simulators, painstaking effort is required to construct mathematical models according to their application. Complex models are sometimes necessary to represent a variety of real phenomena. In contrast, obtaining synthetic observations from generative models developed from real observations often require much less effort. This paper proposes a generative model based on adversarial learning. Given that observations are typically signals composed of a linear combination of sinusoidal waves and random noises, sinusoidal wave generating networks are first designed based on an adversarial network. Audio waveform generation can then be performed using the proposed network. Several approaches to designing the objective function of the proposed network using adversarial learning are investigated experimentally. In addition, amphibian sound classification is performed using a convolutional neural network trained with real and synthetic sounds. Both qualitative and quantitative results show that the proposed generative model makes realistic signals and is very helpful for data augmentation and data analysis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02050

PDF

http://arxiv.org/pdf/1901.02050
Read All
A New Perspective on Machine Learning: How to do Perfect Supervised Learning

2019-01-07

Hui Jiang

arXiv_AI

arXiv_AI
Abstract

In this work, we introduce the concept of bandlimiting into the theory of machine learning because all physical processes are bandlimited by nature, including real-world machine learning tasks. After the bandlimiting constraint is taken into account, our theoretical analysis has shown that all practical machine learning tasks are asymptotically solvable in a perfect sense. Furthermore, the key towards this solvability almost solely relies on two factors: i) a sufficiently large amount of training samples beyond a threshold determined by a difficulty measurement of the underlying task; ii) a sufficiently complex model that is properly bandlimited. Moreover, for unimodal data distributions, we have derived a new error bound for perfect learning, which can quantify the difficulty of learning. This case-specific bound is much tighter than the uniform bounds in conventional learning theory.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02046

PDF

http://arxiv.org/pdf/1901.02046
Read All
Reproducibility Evaluation of SLANT Whole Brain Segmentation Across Clinical Magnetic Resonance Imaging Protocols

2019-01-07

Yunxi Xiong, Yuankai Huo, Jiachen Wang, L. Taylor Davis, Maureen McHugo, Bennett A. Landman

arXiv_CV

arXiv_CV Segmentation CNN Relation
Abstract

Whole brain segmentation on structural magnetic resonance imaging (MRI) is essential for understanding neuroanatomical-functional relationships. Traditionally, multi-atlas segmentation has been regarded as the standard method for whole brain segmentation. In past few years, deep convolutional neural network (DCNN) segmentation methods have demonstrated their advantages in both accuracy and computational efficiency. Recently, we proposed the spatially localized atlas network tiles (SLANT) method, which is able to segment a 3D MRI brain scan into 132 anatomical regions. Commonly, DCNN segmentation methods yield inferior performance under external validations, especially when the testing patterns were not presented in the training cohorts. Recently, we obtained a clinically acquired, multi-sequence MRI brain cohort with 1480 clinically acquired, de-identified brain MRI scans on 395 patients using seven different MRI protocols. Moreover, each subject has at least two scans from different MRI protocols. Herein, we assess the SLANT method’s intra- and inter-protocol reproducibility. SLANT achieved less than 0.05 coefficient of variation (CV) for intra-protocol experiments and less than 0.15 CV for inter-protocol experiments. The results show that the SLANT method achieved high intra- and inter- protocol reproducibility.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02040

PDF

http://arxiv.org/pdf/1901.02040
Read All
Spherical CNNs on Unstructured Grids

2019-01-07

Chiyu "Max" Jiang, Jingwei Huang, Karthik Kashinath, Prabhat, Philip Marcus, Matthias Niessner

arXiv_AI

arXiv_AI Segmentation CNN Semantic_Segmentation Classification
Abstract

We present an efficient convolution kernel for Convolutional Neural Networks (CNNs) on unstructured grids using parameterized differential operators while focusing on spherical signals such as panorama images or planetary signals. To this end, we replace conventional convolution kernels with linear combinations of differential operators that are weighted by learnable parameters. Differential operators can be efficiently estimated on unstructured grids using one-ring neighbors, and learnable parameters can be optimized through standard back-propagation. As a result, we obtain extremely efficient neural networks that match or outperform state-of-the-art network architectures in terms of performance but with a significantly lower number of network parameters. We evaluate our algorithm in an extensive series of experiments on a variety of computer vision and climate science tasks, including shape classification, climate pattern segmentation, and omnidirectional image semantic segmentation. Overall, we present (1) a novel CNN approach on unstructured grids using parameterized differential operators for spherical signals, and (2) we show that our unique kernel parameterization allows our model to achieve the same or higher accuracy with significantly fewer network parameters.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02039

PDF

http://arxiv.org/pdf/1901.02039
Read All
Towards a Decentralized, Autonomous Multiagent Framework for Mitigating Crop Loss

2019-01-07

Roi Ceren, Shannon Quinn, Glen Raines

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We propose a generalized decision-theoretic system for a heterogeneous team of autonomous agents who are tasked with online identification of phenotypically expressed stress in crop fields.. This system employs four distinct types of agents, specific to four available sensor modalities: satellites (Layer 3), uninhabited aerial vehicles (L2), uninhabited ground vehicles (L1), and static ground-level sensors (L0). Layers 3, 2, and 1 are tasked with performing image processing at the available resolution of the sensor modality and, along with data generated by layer 0 sensors, identify erroneous differences that arise over time. Our goal is to limit the use of the more computationally and temporally expensive subsequent layers. Therefore, from layer 3 to 1, each layer only investigates areas that previous layers have identified as potentially afflicted by stress. We introduce a reinforcement learning technique based on Perkins’ Monte Carlo Exploring Starts for a generalized Markovian model for each layer’s decision problem, and label the system the Agricultural Distributed Decision Framework (ADDF). As our domain is real-world and online, we illustrate implementations of the two major components of our system: a clustering-based image processing methodology and a two-layer POMDP implementation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02035

PDF

http://arxiv.org/pdf/1901.02035
Read All
Learning Independent Object Motion from Unlabelled Stereoscopic Videos

2019-01-07

Zhe Cao, Abhishek Kar, Christian Haene, Jitendra Malik

arXiv_CV

arXiv_CV
Abstract

We present a system for learning motion of independently moving objects from stereo videos. The only human annotation used in our system are 2D object bounding boxes which introduce the notion of objects to our system. Unlike prior learning based work which has focused on predicting dense pixel-wise optical flow field and/or a depth map for each image, we propose to predict object instance specific 3D scene flow maps and instance masks from which we are able to derive the motion direction and speed for each object instance. Our network takes the 3D geometry of the problem into account which allows it to correlate the input images. We present experiments evaluating the accuracy of our 3D flow vectors, as well as depth maps and projected 2D optical flow where our jointly learned system outperforms earlier approaches trained for each task independently.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01971

PDF

http://arxiv.org/pdf/1901.01971
Read All
Decision-making and Fuzzy Temporal Logic

2019-01-07

José Cláudio do Nascimento

arXiv_AI

arXiv_AI
Abstract

There are moments where we make decisions involving tradeoffs among costs and benefits occurring in different times. Essentially, in these cases, we are evaluating dynamic processes with outcomes still unknown. So, do we use some intuitive logic to judge changes involving values and time? The fuzzy temporal logic, introduced in this paper, proposes to model the figures of thought necessary to form a rhetoric for decision-making. To exemplify, the intertemporal choices and the lotteries choices are analyzed. The first problem is related to the time preference of receiving amounts on different dates. So it is shown that a subadditive hyperbolic discount function is not anomaly, but it consistently describes the goods delay within the fuzzy temporal logic. The second problem is related to values and probabilities of lotteries, where Prospect Theory behaviors and the S-shaped curve can be described using tense operators and fuzzy set operators. In addition, it is shown that some behaviors are amount dependent where the fuzziness can be decisive in the judgment. Thus, time, uncertainty and fuzziness are unified in a single matter which models the rhetoric for decision-making in different contexts of gains and losses.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01970

PDF

http://arxiv.org/pdf/1901.01970
Read All
Graph- and finite element-based total variation models for the inverse problem in diffuse optical tomography

2019-01-07

Wenqi Lu, Jinming Duan, David Orive-Miguel, Lionel Herve, Iain B Styles

arXiv_CV

arXiv_CV Regularization Sparse Optimization
Abstract

Total variation (TV) is a powerful regularization method that has been widely applied in different imaging applications, but is difficult to apply to diffuse optical tomography (DOT) image reconstruction (inverse problem) due to complex and unstructured geometries, non-linearity of the data fitting and regularization terms, and non-differentiability of the regularization term. We develop several approaches to overcome these difficulties by: i) defining discrete differential operators for unstructured geometries using both finite element and graph representations; ii) developing an optimization algorithm based on the alternating direction method of multipliers (ADMM) for the non-differentiable and non-linear minimization problem; iii) investigating isotropic and anisotropic variants of TV regularization, and comparing their finite element- and graph-based implementations. These approaches are evaluated on experiments on simulated data and real data acquired from a tissue phantom. Our results show that both FEM and graph-based TV regularization is able to accurately reconstruct both sparse and non-sparse distributions without the over-smoothing effect of Tikhonov regularization and the over-sparsifying effect of L$_1$ regularization. The graph representation was found to out-perform the FEM method for low-resolution meshes, and the FEM method was found to be more accurate for high-resolution meshes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01969

PDF

http://arxiv.org/pdf/1901.01969
Read All
Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis

2019-01-07

Kelly W. Zhang, Samuel R. Bowman

arXiv_CL

arXiv_CL Transfer_Learning RNN Language_Model Prediction
Abstract

Recent work using auxiliary prediction task classifiers to investigate the properties of LSTM representations has begun to shed light on why pretrained representations, like ELMo (Peters et al., 2018) and CoVe (McCann et al., 2017), are so beneficial for neural language understanding models. We still, though, do not yet have a clear understanding of how the choice of pretraining objective affects the type of linguistic information that models learn. With this in mind, we compare four objectives—language modeling, translation, skip-thought, and autoencoding—on their ability to induce syntactic and part-of-speech information. We make a fair comparison between the tasks by holding constant the quantity and genre of the training data, as well as the LSTM architecture. We find that representations from language models consistently perform best on our syntactic auxiliary prediction tasks, even when trained on relatively small amounts of data. These results suggest that language modeling may be the best data-rich pretraining task for transfer learning applications requiring syntactic information. We also find that the representations from randomly-initialized, frozen LSTMs perform strikingly well on our syntactic auxiliary tasks, but this effect disappears when the amount of training data for the auxiliary tasks is reduced.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.10040

PDF

http://arxiv.org/pdf/1809.10040
Read All
GASL: Guided Attention for Sparsity Learning in Deep Neural Networks

2019-01-07

Amirsina Torfi, Rouzbeh A. Shirvani, Sobhan Soleymani, Naser M. Nasrabadi

arXiv_CV

arXiv_CV Knowledge Attention
Abstract

The main goal of network pruning is imposing sparsity on the neural network by increasing the number of parameters with zero value in order to reduce the architecture size and the computational speedup. In most of the previous research works, sparsity is imposed stochastically without considering any prior knowledge of the weights distribution or other internal network characteristics. Enforcing too much sparsity may induce accuracy drop due to the fact that a lot of important elements might have been eliminated. In this paper, we propose Guided Attention for Sparsity Learning (GASL) to achieve (1) model compression by having less number of elements and speed-up; (2) prevent the accuracy drop by supervising the sparsity operation via a guided attention mechanism and (3) introduce a generic mechanism that can be adapted for any type of architecture; Our work is aimed at providing a framework based on interpretable attention mechanisms for imposing structured and non-structured sparsity in deep neural networks. For Cifar-100 experiments, we achieved the state-of-the-art sparsity level and 2.91x speedup with competitive accuracy compared to the best method. For MNIST and LeNet architecture we also achieved the highest sparsity and speedup level.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01939

PDF

http://arxiv.org/pdf/1901.01939
Read All
DSConv: Efficient Convolution Operator

2019-01-07

Marcelo Gennari, Roger Fawcett, Victor Adrian Prisacariu

arXiv_CV

arXiv_CV CNN
Abstract

We introduce a variation of the convolutional layer called DSConv (Distribution Shifting Convolution) that can be readily substituted into standard neural network architectures and achieve both lower memory usage and higher computational speed. DSConv breaks down the traditional convolution kernel into two components: Variable Quantized Kernel (VQK), and Distribution Shifts. Lower memory usage and higher speeds are achieved by storing only integer values in the VQK, whilst preserving the same output as the original convolution by applying both kernel and channel based distribution shifts. We test DSConv in ImageNet on ResNet50 and 34, as well as AlexNet and MobileNet. We achieve a reduction in memory usage of up to 14x in the convolutional kernels and speed up operations of up to 10x by substituting floating point operations to integer operations. Furthermore, unlike other quantization approaches, our work allows for a degree of retraining to new tasks and datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01928

PDF

http://arxiv.org/pdf/1901.01928
Read All
On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution

2019-01-07

Yuqian Zhang, Yenson Lau, Han-Wen Kuo, Sky Cheung, Abhay Pasupathy, John Wright

arXiv_CV

arXiv_CV Sparse CNN Optimization
Abstract

Blind deconvolution is the problem of recovering a convolutional kernel $\boldsymbol a_0$ and an activation signal $\boldsymbol x_0$ from their convolution $\boldsymbol y = \boldsymbol a_0 \circledast \boldsymbol x_0$. This problem is ill-posed without further constraints or priors. This paper studies the situation where the nonzero entries in the activation signal are sparsely and randomly populated. We normalize the convolution kernel to have unit Frobenius norm and cast the sparse blind deconvolution problem as a nonconvex optimization problem over the sphere. With this spherical constraint, every spurious local minimum turns out to be close to some signed shift truncation of the ground truth, under certain hypotheses. This benign property motivates an effective two stage algorithm that recovers the ground truth from the partial information offered by a suboptimal local minimum. This geometry-inspired algorithm recovers the ground truth for certain microscopy problems, also exhibits promising performance in the more challenging image deblurring problem. Our insights into the global geometry and the two stage algorithm extend to the convolutional dictionary learning problem, where a superposition of multiple convolution signals is observed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01913

PDF

http://arxiv.org/pdf/1901.01913
Read All
Stance Classification for Rumour Analysis in Twitter: Exploiting Affective Information and Conversation Structure

2019-01-07

Endang Wahyu Pamungkas, Valerio Basile, Viviana Patti

arXiv_CL

arXiv_CL Face Classification
Abstract

Analysing how people react to rumours associated with news in social media is an important task to prevent the spreading of misinformation, which is nowadays widely recognized as a dangerous tendency. In social media conversations, users show different stances and attitudes towards rumourous stories. Some users take a definite stance, supporting or denying the rumour at issue, while others just comment it, or ask for additional evidence related to the veracity of the rumour. On this line, a new shared task has been proposed at SemEval-2017 (Task 8, SubTask A), which is focused on rumour stance classification in English tweets. The goal is predicting user stance towards emerging rumours in Twitter, in terms of supporting, denying, querying, or commenting the original rumour, looking at the conversation threads originated by the rumour. This paper describes a new approach to this task, where the use of conversation-based and affective-based features, covering different facets of affect, has been explored. Our classification model outperforms the best-performing systems for stance classification at SemEval-2017 Task 8, showing the effectiveness of the feature set proposed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01911

PDF

http://arxiv.org/pdf/1901.01911
Read All
Scale-Aware Trident Networks for Object Detection

2019-01-07

Yanghao Li, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Scale variation is one of the key challenges in object detection. In this work, we first present a controlled experiment to investigate the effect of receptive fields on the detection of different scale objects. Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power. We construct a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields. Then, we propose a scale-aware training scheme to specialize each branch by sampling object instances of proper scales for training. As a bonus, a fast approximation version of TridentNet could achieve significant improvements without any additional parameters and computational cost. On the COCO dataset, our TridentNet with ResNet-101 backbone achieves state-of-the-art single-model results by obtaining an mAP of 48.4. Code will be made publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01892

PDF

http://arxiv.org/pdf/1901.01892
Read All
NVS Machines: Learning Novel View Synthesis with Fine-grained View Control

2019-01-07

Xu Chen, Jie Song, Otmar Hilliges

arXiv_CV

arXiv_CV Prediction
Abstract

We present an approach that learns to synthesize high-quality, novel views of 3D objects or scenes, while providing fine-grained and precise control over the 6-DOF viewpoint. The approach is self-supervised and only requires 2D images and associated view transforms for training. Our main contribution is a network architecture that leverages a transforming auto-encoder in combination with a depth-guided warping procedure to predict geometrically accurate unseen views. Leveraging geometric constraints renders direct supervision via depth or flow maps unnecessary. If large parts of the object are occluded in the source view, a purely learning based prior is used to predict the values for dis-occluded pixels. Our network furthermore predicts a per-pixel mask, used to fuse depth-guided and pixel-based predictions. The resulting images reflect the desired 6-DOF transformation and details are preserved. We thoroughly evaluate our architecture on synthetic and real scenes and under fine-grained and fixed-view settings. Finally, we demonstrate that the approach generalizes to entirely unseen images such as product images downloaded from the internet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01880

PDF

http://arxiv.org/pdf/1901.01880
Read All
Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions

2019-01-07

Yifei Huang, Minjie Cai, Zhenqiang Li, Yoichi Sato

arXiv_CV

arXiv_CV Action_Recognition Prediction Recognition
Abstract

In this work, we address two coupled tasks of gaze prediction and action recognition in egocentric videos by exploring their mutual context. Our assumption is that in the procedure of performing a manipulation task, what a person is doing determines where the person is looking at, and the gaze point reveals gaze and non-gaze regions which contain important and complementary information about the undergoing action. We propose a novel mutual context network (MCN) that jointly learns action-dependent gaze prediction and gaze-guided action recognition in an end-to-end manner. Experiments on public egocentric video datasets demonstrate that our MCN achieves state-of-the-art performance of both gaze prediction and action recognition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01874

PDF

http://arxiv.org/pdf/1901.01874
Read All
A* Tree Search for Portfolio Management

2019-01-07

Xiaojie Gao, Shikui Tu, Lei Xu

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We propose a planning-based method to teach an agent to manage portfolio from scratch. Our approach combines deep reinforcement learning techniques with search techniques like AlphaGo. By uniting the advantages in A* search algorithm with Monte Carlo tree search, we come up with a new algorithm named A* tree search in which best information is returned to guide next search. Also, the expansion mode of Monte Carlo tree is improved for a higher utilization of the neural network. The suggested algorithm can also optimize non-differentiable utility function by combinatorial search. This technique is then used in our trading system. The major component is a neural network that is trained by trading experiences from tree search and outputs prior probability to guide search by pruning away branches in turn. Experimental results on simulated and real financial data verify the robustness of the proposed trading system and the trading system produces better strategies than several approaches based on reinforcement learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01855

PDF

http://arxiv.org/pdf/1901.01855
Read All
Self-Supervised Learning from Web Data for Multimodal Retrieval

2019-01-07

Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas

arXiv_CV

arXiv_CV Image_Retrieval Knowledge Embedding
Abstract

Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the proposed pipeline can learn from images with associated textwithout supervision and analyze the semantic structure of the learnt joint image and text embedding space. We perform a thorough analysis and performance comparison of five different state of the art text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further, we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02004

PDF

http://arxiv.org/pdf/1901.02004
Read All
Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots

2019-01-07

Jia-Chen Gu, Zhen-Hua Ling, Quan Liu

arXiv_CL

arXiv_CL QA Attention
Abstract

In this paper, we propose an interactive matching network (IMN) to enhance the representations of contexts and responses at both the word level and sentence level for the multi-turn response selection task. First, IMN constructs word representations from three aspects to address the challenge of out-of-vocabulary (OOV) words. Second, an attentive hierarchical recurrent encoder (AHRE), which is capable of encoding sentences hierarchically and generating more descriptive representations by aggregating with an attention mechanism, is designed. Finally, the bidirectional interactions between whole multi-turn contexts and response candidates are calculated to derive the matching information between them. Experiments on four public datasets show that IMN significantly outperforms the baseline models by large margins on all metrics, achieving new state-of-the-art performance and demonstrating compatibility across domains for multi-turn response selection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01824

PDF

http://arxiv.org/pdf/1901.01824
Read All
Fusing Body Posture with Facial Expressions for Joint Recognition of Affect in Child-Robot Interaction

2019-01-07

Panagiotis P. Filntisis, Niki Efthymiou, Petros Koutras, Gerasimos Potamianos, Petros Maragos

arXiv_CV

arXiv_CV Recognition
Abstract

In this paper we address the problem of multi-cue affect recognition in challenging environments such as child-robot interaction. Towards this goal we propose a method for automatic recognition of affect that leverages body expressions alongside facial expressions, as opposed to traditional methods that usually focus only on the latter. We evaluate our methods on a challenging child-robot interaction database of emotional expressions, as well as on a database of emotional expressions by actors, and show that the proposed method achieves significantly better results when compared with the facial expression baselines, can be trained both jointly and separately, and offers us computational models for both the individual modalities, as well as for the whole body emotion.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01805

PDF

http://arxiv.org/pdf/1901.01805
Read All
The Tsetlin Machine - A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic

2019-01-07

Ole-Christoffer Granmo

arXiv_AI

arXiv_AI Recognition
Abstract

Although simple individually, artificial neurons provide state-of-the-art performance when interconnected in deep networks. Unknown to many, there exists an arguably even simpler and more versatile learning mechanism, namely, the Tsetlin Automaton. Merely by means of a single integer as memory, it learns the optimal action in stochastic environments through increment and decrement operations. In this paper, we introduce the Tsetlin Machine, which solves complex pattern recognition problems with easy-to-interpret propositional formulas, composed by a collective of Tsetlin Automata. To eliminate the longstanding problem of vanishing signal-to-noise ratio, the Tsetlin Machine orchestrates the automata using a novel game. Our theoretical analysis establishes that the Nash equilibria of the game align with the propositional formulas that provide optimal pattern recognition accuracy. This translates to learning without local optima, only global ones. We argue that the Tsetlin Machine finds the propositional formula that provides optimal accuracy, with probability arbitrarily close to unity. In five benchmarks, the Tsetlin Machine provides competitive performance compared with Support Vector Machines, Random Forests, Naive Bayes Classifier, Logistic Regression, and Neural Networks. The Tsetlin Machine has an inherent computational performance advantage since both inputs, patterns, and outputs are expressed as bits, while both recognition and learning rely on bit manipulation. The combination of accuracy, interpretability, and computational simplicity makes the Tsetlin Machine a promising tool for a wide range of domains. Being the first of its kind, we believe the Tsetlin Machine will kick-start new paths of research, with a potentially significant impact on the AI field and the applications of AI.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.01508

PDF

http://arxiv.org/pdf/1804.01508
Read All
PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds

2019-01-07

Aseem Behl, Despoina Paschalidou, Simon Donné, Andreas Geiger

arXiv_CV

arXiv_CV Inference
Abstract

Despite significant progress in image-based 3D scene flow estimation, the performance of such approaches has not yet reached the fidelity required by many applications. Simultaneously, these applications are often not restricted to image-based estimation: laser scanners provide a popular alternative to traditional cameras, for example in the context of self-driving cars, as they directly yield a 3D point cloud. In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network. In a single forward pass, our model jointly predicts 3D scene flow as well as the 3D bounding box and rigid body motion of objects in the scene. While the prospect of estimating 3D scene flow from unstructured point clouds is promising, it is also a challenging task. We show that the traditional global representation of rigid body motion prohibits inference by CNNs, and propose a translation equivariant representation to circumvent this problem. For training our deep network, a large dataset is required. Because of this, we augment real scans from KITTI with virtual objects, realistically modeling occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights the robustness of the proposed approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.02170

PDF

http://arxiv.org/pdf/1806.02170
Read All
CAE-RLSM: Consistent and Efficient Redundant Line Segment Merging for Online Feature Map Building

2019-01-07

Jian Wen, Xuebo Zhang, Haiming Gao, Jing Yuan, Yongchun Fang

arXiv_RO

arXiv_RO SLAM Relation
Abstract

In order to obtain a compact line segment-based map representation for localization and planning of mobile robots, it is necessary to merge redundant line segments which physically represent the same part of the environment in different scans. In this paper, a consistent and efficient redundant line segment merging approach (CAE-RLSM) is proposed for online feature map building. The proposed CAE-RLSM is composed of two newly proposed modules: one-to-many incremental line segment merging (OTM-ILSM) and multi-processing global map adjustment (MP-GMA). Different from state-of-the-art offline merging approaches, the proposed CAE-RLSM can achieve real-time mapping performance, which not only reduces the redundancy of incremental merging with high efficiency, but also solves the problem of global map adjustment after loop closing to guarantee global consistency. Furthermore, a new correlation-based evaluation metric is proposed for the quality evaluation of line segment maps. This evaluation metric does not require manual measurement of the environmental metric information, instead it makes full use of globally consistent laser scans obtained by simultaneous localization and mapping (SLAM) systems to compare the performance of different line segment-based mapping approaches in an objective and fair manner. Comparative experimental results with respect to a mean shift-based offline redundant line segment merging approach (MS-RLSM) and an offline version of one-to-one incremental line segment merging approach (OTO-ILSM) on both public data sets and self-recorded data set are presented to show the superior performance of CAE-RLSM in terms of efficiency and map quality in different scenarios.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01766

PDF

http://arxiv.org/pdf/1901.01766
Read All
Human Pose Estimation with Spatial Contextual Information

2019-01-07

Hong Zhang, Hao Ouyang, Shu Liu, Xiaojuan Qi, Xiaoyong Shen, Ruigang Yang, Jiaya Jia

arXiv_CV

arXiv_CV Pose_Estimation Prediction Relation
Abstract

We explore the importance of spatial contextual information in human pose estimation. Most state-of-the-art pose networks are trained in a multi-stage manner and produce several auxiliary predictions for deep supervision. With this principle, we present two conceptually simple and yet computational efficient modules, namely Cascade Prediction Fusion (CPF) and Pose Graph Neural Network (PGNN), to exploit underlying contextual information. Cascade prediction fusion accumulates prediction maps from previous stages to extract informative signals. The resulting maps also function as a prior to guide prediction at following stages. To promote spatial correlation among joints, our PGNN learns a structured representation of human pose as a graph. Direct message passing between different joints is enabled and spatial relation is captured. These two modules require very limited computational complexity. Experimental results demonstrate that our method consistently outperforms previous methods on MPII and LSP benchmark.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01760

PDF

http://arxiv.org/pdf/1901.01760
Read All
Analogy-Based Preference Learning with Kernels

2019-01-07

Mohsen Ahmadi Fahandar, Eyke Hüllermeier

arXiv_AI

arXiv_AI Relation
Abstract

Building on a specific formalization of analogical relationships of the form “A relates to B as C relates to D”, we establish a connection between two important subfields of artificial intelligence, namely analogical reasoning and kernel-based machine learning. More specifically, we show that so-called analogical proportions are closely connected to kernel functions on pairs of objects. Based on this result, we introduce the analogy kernel, which can be seen as a measure of how strongly four objects are in analogical relationship. As an application, we consider the problem of object ranking in the realm of preference learning, for which we develop a new method based on support vector machines trained with the analogy kernel. Our first experimental results for data sets from different domains (sports, education, tourism, etc.) are promising and suggest that our approach is competitive to state-of-the-art algorithms in terms of predictive accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02001

PDF

http://arxiv.org/pdf/1901.02001
Read All
On the global Gan-Gross-Prasad conjecture for general spin groups

2019-01-07

Melissa Emory

arXiv_CV

arXiv_CV GAN Relation
Abstract

We formulate a global Gan-Gross-Prasad conjecture for general spin groups. That is, we formulate a conjecture on a relation between periods of certain automorphic forms on $GSpin_{n+1} \times GSpin_n$ along the diagonal subgroup $GSpin_n$ and some $L$-values. To support the conjecture, we show that the conjecture holds for $n=2$ and $3$ and for certain cases for $n=4$.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.01746

PDF

https://arxiv.org/pdf/1901.01746
Read All
Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets

2019-01-07

Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Vasileios Belagiannis, Sikandar Amin, Alessio Del Bue, Marco Cristani, Fabio Galasso

arXiv_CV

arXiv_CV Attention Optimization RNN Prediction Relation
Abstract

In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedestrian interaction, are often combined with tracklets. In this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between positions and head orientations (vislets) thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. We additionally exploit the head orientations as a proxy for the visual attention, when modeling social interactions. MX-LSTM predicts future pedestrians location and head pose, increasing the standard capabilities of the current approaches on long-term trajectory forecasting. Compared to the state-of-the-art, our approach shows better performances on an extensive set of public benchmarks. MX-LSTM is particularly effective when people move slowly, i.e. the most challenging scenario for all other models. The proposed approach also allows for accurate predictions on a longer time horizon.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02000

PDF

http://arxiv.org/pdf/1901.02000
Read All

194/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL