Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

2019-05-13

Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, Alan W Black

arXiv_CL

arXiv_CL Embedding
Abstract

Online texts – across genres, registers, domains, and styles – are riddled with human stereotypes, expressed in overt or subtle ways. Word embeddings, trained on these texts, perpetuate and amplify these stereotypes, and propagate biases to machine learning models that use word embeddings as features. In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender. Next, we propose a novel methodology for the evaluation of multiclass debiasing. We demonstrate that our multiclass debiasing is robust and maintains the efficacy in standard NLP tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04047

PDF

http://arxiv.org/pdf/1904.04047
Read All
VideoGraph: Recognizing Minutes-Long Human Activities in Videos

2019-05-13

Noureldien Hussein, Efstratios Gavves, Arnold W.M. Smeulders

arXiv_CV

arXiv_CV CNN
Abstract

Many human activities take minutes to unfold. To represent them, related works opt for statistical pooling, which neglects the temporal structure. Others opt for convolutional methods, as CNN and Non-Local. While successful in learning temporal concepts, they are short of modeling minutes-long temporal dependencies. We propose VideoGraph, a method to achieve the best of two worlds: represent minutes-long human activities and learn their underlying temporal structure. VideoGraph learns a graph-based representation for human activities. The graph, its nodes and edges are learned entirely from video datasets, making VideoGraph applicable to problems without node-level annotation. The result is improvements over related works on benchmarks: Epic-Kitchen and Breakfast. Besides, we demonstrate that VideoGraph is able to learn the temporal structure of human activities in minutes-long videos.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05143

PDF

http://arxiv.org/pdf/1905.05143
Read All
Let's Push Things Forward: A Survey on Robot Pushing

2019-05-13

Jochen Stüber, Claudio Zito, Rustam Stolkin

arXiv_AI

arXiv_AI Review Survey Deep_Learning
Abstract

As robot make their way out of factories into human environments, outer space, and beyond, they require the skill to manipulate their environment in multifarious, unforeseeable circumstances. With this regard, pushing is an essential motion primitive that dramatically extends a robot’s manipulation repertoire. In this work, we review the robotic pushing literature. While focusing on work concerned with predicting the motion of pushed objects, we also cover relevant applications of pushing for planning and control. Beginning with analytical approaches, under which we also subsume physics engines, we then proceed to discuss work on learning models from data. In doing so, we dedicate a separate section to deep learning approaches which have seen a recent upsurge in the literature. Concluding remarks and further research perspectives are given at the end of the paper.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05138

PDF

http://arxiv.org/pdf/1905.05138
Read All
Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models

2019-05-13

Abhishek Sinha, Mayank Singh, Nupur Kumari, Balaji Krishnamurthy, Harshitha Machiraju, V N Balasubramanian

arXiv_CV

arXiv_CV Adversarial
Abstract

Neural networks are vulnerable to adversarial attacks – small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against these adversarial attacks is to use the methodology of adversarial training. We analyze the adversarially trained robust models to study their vulnerability against adversarial attacks at the level of the latent layers. Our analysis reveals that contrary to the input layer which is robust to adversarial attack, the latent layer of these robust models are highly susceptible to adversarial perturbations of small magnitude. Leveraging this information, we introduce a new technique Latent Adversarial Training (LAT) which comprises of fine-tuning the adversarially trained models to ensure the robustness at the feature layers. We also propose Latent Attack (LA), a novel algorithm for construction of adversarial examples. LAT results in minor improvement in test accuracy and leads to a state-of-the-art adversarial accuracy against the universal first-order adversarial PGD attack which is shown for the MNIST, CIFAR-10, CIFAR-100 datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05186

PDF

https://arxiv.org/pdf/1905.05186
Read All
Occlusion-Robust MVO: Multimotion Estimation Through Occlusion Via Motion Closure

2019-05-13

Kevin M. Judd, Jonathan D. Gammell

arXiv_RO

arXiv_RO Tracking Object_Tracking
Abstract

Visual motion estimation is an integral and well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation, which is especially challenging in highly dynamic environments. Such environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Previous work in multiple object tracking focuses on maintaining the integrity of object tracks but usually relies on specific appearance-based descriptors or constrained motion models. These approaches are very effective in specific applications but do not generalize to the full multimotion estimation problem. This paper extends the multimotion visual odometry (MVO) pipeline to estimate multiple motions through occlusion, including the camera egomotion, by employing physically founded motion priors. This allows the pipeline to consistently estimate the full trajectory of every motion in a scene and recognize when temporarily occluded motions become unoccluded. The estimation performance of the pipeline is evaluated on real-world data from the Oxford Multimotion Dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05121

PDF

http://arxiv.org/pdf/1905.05121
Read All
Generalizing from a Few Examples: A Survey on Few-Shot Learning

2019-05-13

Yaqing Wang, Quanming Yao, James Kwok, Lionel M. Ni

arXiv_AI

arXiv_AI Knowledge Survey
Abstract

Artificial intelligence succeeds in data-intensive applications, but it lacks the ability to learn from a limited number of examples. To tackle this problem, Few-Shot Learning (FSL) is proposed. It can rapidly generalize from new tasks of limited supervised experience using prior knowledge. To fully understand FSL, we conduct a survey study. We first clarify a formal definition for FSL. Then we figure out that the unreliable empirical risk minimizer is the core issue of FSL. Based on how prior knowledge is used to deal with the core issue, we categorize different FSL methods into three perspectives: data uses the prior knowledge to augment the supervised experience, model constrains the hypothesis space by prior knowledge, and algorithm uses prior knowledge to alter the search for the parameter of the best hypothesis in the hypothesis space. Under this unified taxonomy, we provide a thorough discussion of pros and cons across different categories. Finally, we propose possible directions for FSL in terms of problem setup, techniques, applications, and theories, in the hope of providing insights to the following research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.05046

PDF

http://arxiv.org/pdf/1904.05046
Read All
Block Coordinate Regularization by Denoising

2019-05-13

Yu Sun, Jiaming Liu, Ulugbek S. Kamilov

arXiv_CV

arXiv_CV Regularization CNN Optimization Relation
Abstract

We consider the problem of estimating a vector from its noisy measurements using a prior specified only through a denoising function. Recent work on plug-and-play priors (PnP) and regularization-by-denoising (RED) has shown the state-of-the-art performance of estimators under such priors in a range of imaging tasks. In this work, we develop a new block coordinate RED algorithm that decomposes a large-scale estimation problem into a sequence of updates over a small subset of the unknown variables. We theoretically analyze the convergence of the algorithm and discuss its relationship to the traditional proximal optimization. Our analysis complements and extends recent theoretical results for RED-based estimation methods. We numerically validate our method using several denoiser priors, including those based on convolutional neural network (CNN) denoisers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05113

PDF

http://arxiv.org/pdf/1905.05113
Read All
Spectral Analysis of Kernel and Neural Embeddings: Optimization and Generalization

2019-05-13

Emilio Jorge, Morteza Haghir Chehreghani, Devdatt Dubhashi

arXiv_AI

arXiv_AI Embedding Optimization Quantitative
Abstract

We extend the recent results of (Arora et al., 2019) by a spectral analysis of representations corresponding to kernel and neural embeddings. They showed that in a simple single layer network, the alignment of the labels to the eigenvectors of the corresponding Gram matrix determines both the convergence of the optimization during training as well as the generalization properties. We show quantitatively that kernel and neural representations improve both optimization and generalization. We give results for the Gaussian kernel and approximations by random Fourier features as well as for embeddings produced by two layer networks trained on different tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05095

PDF

http://arxiv.org/pdf/1905.05095
Read All
Joint demosaicing and denoising by overfitting of bursts of raw images

2019-05-13

Thibaud Ehret, Axel Davy, Pablo Arias, Gabriele Facciolo

arXiv_CV

arXiv_CV CNN
Abstract

Demosaicking and denoising are the first steps of any camera image processing pipeline and are key for obtaining high quality RGB images. A promising current research trend aims at solving these two problems jointly using convolutional neural networks. Due to the unavailability of ground truth data these networks cannot be currently trained using real RAW images. Instead, they resort to simulated data. In this paper we present a method to learn demosacking directly from mosaicked images, without requiring ground truth RGB data. We apply this to learn joint demosaicking and denoising only from RAW images, thus enabling the use of real data. In addition we show that for this application overfitting a network to a specific burst improves the quality of restoration for both demosaicking and denoising.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05092

PDF

http://arxiv.org/pdf/1905.05092
Read All
Weakly-supervised Caricature Face Parsing through Domain Adaptation

2019-05-13

Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Deng Cai, Ming-Hsuan Yang

arXiv_CV

arXiv_CV Face Style_Transfer Recognition
Abstract

A caricature is an artistic form of a person’s picture in which certain striking characteristics are abstracted or exaggerated in order to create a humor or sarcasm effect. For numerous caricature related applications such as attribute recognition and caricature editing, face parsing is an essential pre-processing step that provides a complete facial structure understanding. However, current state-of-the-art face parsing methods require large amounts of labeled data on the pixel-level and such process for caricature is tedious and labor-intensive. For real photos, there are numerous labeled datasets for face parsing. Thus, we formulate caricature face parsing as a domain adaptation problem, where real photos play the role of the source domain, adapting to the target caricatures. Specifically, we first leverage a spatial transformer based network to enable shape domain shifts. A feed-forward style transfer network is then utilized to capture texture-level domain gaps. With these two steps, we synthesize face caricatures from real photos, and thus we can use parsing ground truths of the original photos to learn the parsing model. Experimental results on the synthetic and real caricatures demonstrate the effectiveness of the proposed domain adaptation algorithm. Code is available at: https://github.com/ZJULearning/CariFaceParsing .

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05091

PDF

http://arxiv.org/pdf/1905.05091
Read All
A novel statistical metric learning for hyperspectral image classification

2019-05-13

Zhiqiang Gong, Ping Zhong, Weidong Hu, Zixuan Xiao, Xuping Yin

arXiv_CV

arXiv_CV Image_Classification Classification
Abstract

In this paper, a novel statistical metric learning is developed for spectral-spatial classification of the hyperspectral image. First, the standard variance of the samples of each class in each batch is used to decrease the intra-class variance within each class. Then, the distances between the means of different classes are used to penalize the inter-class variance of the training samples. Finally, the standard variance between the means of different classes is added as an additional diversity term to repulse different classes from each other. Experiments have conducted over two real-world hyperspectral image datasets and the experimental results have shown the effectiveness of the proposed statistical metric learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05087

PDF

http://arxiv.org/pdf/1905.05087
Read All
Medical image super-resolution method based on dense blended attention network

2019-05-13

Kewen Liu, Yuan Ma, Hongxia Xiong, Zejun Yan, Zhijun Zhou, Panpan Fang, Chaoyang Liu

arXiv_CV

arXiv_CV Super_Resolution Attention CNN
Abstract

In order to address the issue that medical image would suffer from severe blurring caused by the lack of high-frequency details in the process of image super-resolution reconstruction, a novel medical image super-resolution method based on dense neural network and blended attention mechanism is proposed. The proposed method adds blended attention blocks to dense neural network(DenseNet), so that the neural network can concentrate more attention to the regions and channels with sufficient high-frequency details. Batch normalization layers are removed to avoid loss of high-frequency texture details. Final obtained high resolution medical image are obtained using deconvolutional layers at the very end of the network as up-sampling operators. Experimental results show that the proposed method has an improvement of 0.05db to 11.25dB and 0.6% to 14.04% on the peak signal-to-noise ratio(PSNR) metric and structural similarity index(SSIM) metric, respectively, compared with the mainstream image super-resolution methods. This work provides a new idea for theoretical studies of medical image super-resolution reconstruction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05084

PDF

http://arxiv.org/pdf/1905.05084
Read All
Dynamic Weights in Multi-Objective Deep Reinforcement Learning

2019-05-13

Axel Abels, Diederik M. Roijers, Tom Lenaerts, Ann Nowé, Denis Steckelmacher

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Many real-world decision problems are characterized by multiple conflicting objectives which must be balanced based on their relative importance. In the dynamic weights setting the relative importance changes over time and specialized algorithms that deal with such change, such as a tabular Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are required. However, this earlier work is not feasible for RL settings that necessitate the use of function approximators. We generalize across weight changes and high-dimensional inputs by proposing a multi-objective Q-network whose outputs are conditioned on the relative importance of objectives and we introduce Diverse Experience Replay (DER) to counter the inherent non-stationarity of the Dynamic Weights setting. We perform an extensive experimental evaluation and compare our methods to adapted algorithms from Deep Multi-Task/Multi-Objective Reinforcement Learning and show that our proposed network in combination with DER dominates these adapted algorithms across weight change scenarios and problem domains.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.07803

PDF

http://arxiv.org/pdf/1809.07803
Read All
Randomized Adversarial Imitation Learning for Autonomous Driving

2019-05-13

MyungJae Shin, Joongheon Kim

arXiv_AI

arXiv_AI Adversarial Optimization
Abstract

With the evolution of various advanced driver assistance system (ADAS) platforms, the design of autonomous driving system is becoming more complex and safety-critical. The autonomous driving system simultaneously activates multiple ADAS functions; and thus it is essential to coordinate various ADAS functions. This paper proposes a randomized adversarial imitation learning (RAIL) method that imitates the coordination of autonomous vehicle equipped with advanced sensors. The RAIL policies are trained through derivative-free optimization for the decision maker that coordinates the proper ADAS functions, e.g., smart cruise control and lane keeping system. Especially, the proposed method is also able to deal with the LIDAR data and makes decisions in complex multi-lane highways and multi-agent environments.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.05637

PDF

https://arxiv.org/pdf/1905.05637
Read All
Object Detection in 20 Years: A Survey

2019-05-13

Zhengxia Zou (1), Zhenwei Shi (2), Yuhong Guo (3 and 4), Jieping Ye (1 and 4) ((1) University of Michigan, (2) Beihang University, (3) Carleton University, (4) DiDi Chuxing)

arXiv_CV

arXiv_CV Review Object_Detection Attention Face Survey Deep_Learning Detection Face_Detection
Abstract

Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today’s object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century’s time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05055

PDF

http://arxiv.org/pdf/1905.05055
Read All
Multi-View Multiple Clustering

2019-05-13

Shixing Yao, Guoxian Yu, Jun Wang, Carlotta Domeniconi, Xiangliang Zhang

arXiv_AI

arXiv_AI GAN Represenation_Learning
Abstract

Multiple clustering aims at exploring alternative clusterings to organize the data into meaningful groups from different perspectives. Existing multiple clustering algorithms are designed for single-view data. We assume that the individuality and commonality of multi-view data can be leveraged to generate high-quality and diverse clusterings. To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first adapts multi-view self-representation learning to explore the individuality encoding matrices and the shared commonality matrix of multi-view data. It additionally reduces the redundancy (i.e., enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion (HSIC), and collects shared information by forcing the shared matrix to be smooth across all views. It then uses matrix factorization on the individual matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further extend multiple co-clustering on multi-view data and propose a solution called multi-view multiple co-clustering (MVMCC). Our empirical study shows that MVMC (MVMCC) can exploit multi-view data to generate multiple high-quality and diverse clusterings (co-clusterings), with superior performance to the state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05053

PDF

http://arxiv.org/pdf/1905.05053
Read All
DLOW: Domain Flow for Adaptation and Generalization

2019-05-13

Rui Gong, Wen Li, Yuhua Chen, Luc Van Gool

arXiv_CV

arXiv_CV Segmentation GAN Semantic_Segmentation Inference
Abstract

In this work, we present a domain flow generation(DLOW) model to bridge two different domains by generating a continuous sequence of intermediate domains flowing from one domain to the other. The benefits of our DLOW model are two-fold. First, it is able to transfer source images into different styles in the intermediate domains. The transferred images smoothly bridge the gap between source and target domains, thus easing the domain adaptation task. Second, when multiple target domains are provided for training, our DLOW model is also able to generate new styles of images that are unseen in the training data. We implement our DLOW model based on CycleGAN. A domainness variable is introduced to guide the model to generate the desired intermediate domain images. In the inference phase, a flow of various styles of images can be obtained by varying the domainness variable. We demonstrate the effectiveness of our model for both cross-domain semantic segmentation and the style generalization tasks on benchmark datasets. Our implementation is available at https://github.com/ETHRuiGong/DLOW.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.05418

PDF

http://arxiv.org/pdf/1812.05418
Read All
A Review of Keyphrase Extraction

2019-05-13

Eirini Papagiannopoulou, Grigorios Tsoumakas

arXiv_CL

arXiv_CL Review GAN
Abstract

Automated keyphrase extraction is a crucial textual information processing task regarding the most types of digital content management systems. It concerns the selection of representative and characteristic phrases from a document that express all aspects related to its content. This article introduces the task of keyphrase extraction and provides a view of existing work that is well organized and comprehensive. Moreover, it discusses the different evaluation approaches giving meaningful insights and highlighting open issues. Finally, a comparative experimental study for popular unsupervised techniques on five datasets is presented.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05044

PDF

http://arxiv.org/pdf/1905.05044
Read All
Precipitation nowcasting using a stochastic variational frame predictor with learned prior distribution

2019-05-13

Alexander Bihlo

arXiv_CV

arXiv_CV CNN RNN Prediction
Abstract

We propose the use of a stochastic variational frame prediction deep neural network with a learned prior distribution trained on two-dimensional rain radar reflectivity maps for precipitation nowcasting with lead times of up to 2 1/2 hours. We present a comparison to a standard convolutional LSTM network and assess the evolution of the structural similarity index for both methods. Case studies are presented that illustrate that the novel methodology can yield meaningful forecasts without excessive blur for the time horizons of interest.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05037

PDF

http://arxiv.org/pdf/1905.05037
Read All
Joint Object and State Recognition using Language Knowledge

2019-05-13

Ahmad Babaeian Jelodar, Yu Sun

arXiv_AI

arXiv_AI Knowledge_Graph Knowledge Classification Prediction Recognition
Abstract

The state of an object is an important piece of knowledge in robotics applications. States and objects are intertwined together, meaning that object information can help recognize the state of an image and vice versa. This paper addresses the state identification problem in cooking related images and uses state and object predictions together to improve the classification accuracy of objects and their states from a single image. The pipeline presented in this paper includes a CNN with a double classification layer and the Concept-Net language knowledge graph on top. The language knowledge creates a semantic likelihood between objects and states. The resulting object and state confidences from the deep architecture are used together with object and state relatedness estimates from a language knowledge graph to produce marginal probabilities for objects and states. The marginal probabilities and confidences of objects (or states) are fused together to improve the final object (or state) classification results. Experiments on a dataset of cooking objects show that using a language knowledge graph on top of a deep neural network effectively enhances object and state classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08843

PDF

http://arxiv.org/pdf/1905.08843
Read All
Almost Unsupervised Text to Speech and Automatic Speech Recognition

2019-05-13

Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

arXiv_CL

arXiv_CL Speech_Recognition Deep_Learning Language_Model Recognition
Abstract

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data. However, the lack of aligned data poses a major practical problem for TTS and ASR on low-resource languages. In this paper, by leveraging the dual nature of the two tasks, we propose an almost unsupervised learning method that only leverages few hundreds of paired data and extra unpaired data for TTS and ASR. Our method consists of the following components: (1) a denoising auto-encoder, which reconstructs speech and text sequences respectively to develop the capability of language modeling both in speech and text domain; (2) dual transformation, where the TTS model transforms the text $y$ into speech $\hat{x}$, and the ASR model leverages the transformed pair $(\hat{x},y)$ for training, and vice versa, to boost the accuracy of the two tasks; (3) bidirectional sequence modeling, which addresses error propagation especially in the long speech and text sequence when training with few paired data; (4) a unified model structure, which combines all the above components for TTS and ASR based on Transformer model. Our method achieves 99.84% in terms of word level intelligible rate and 2.68 MOS for TTS, and 11.7% PER for ASR on LJSpeech dataset, by leveraging only 200 paired speech and text data (about 20 minutes audio), together with extra unpaired speech and text data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.06791

PDF

http://arxiv.org/pdf/1905.06791
Read All
MUSCO: Multi-Stage Compression of neural networks

2019-05-13

Julia Gusak, Maksym Kholyavchenko, Evgeny Ponomarev, Larisa Markeeva, Ivan Oseledets, Andrzej Cichocki

arXiv_CV

arXiv_CV
Abstract

The low-rank tensor approximation is very promising for the compression of deep neural networks. We propose a new simple and efficient iterative approach, which alternates low-rank factorization with a smart rank selection and fine-tuning. We demonstrate the efficiency of our method comparing to non-iterative ones. Our approach improves the compression rate while maintaining the accuracy for a variety of tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09973

PDF

http://arxiv.org/pdf/1903.09973
Read All
Bayesian Hierarchical Mixture Clustering using Multilevel Hierarchical Dirichlet Processes

2019-05-13

Weipeng Huang, Nishma Laitonjam, Guangyuan Piao, Neil Hurley

arXiv_AI

arXiv_AI Inference
Abstract

This paper focuses on the problem of hierarchical non-overlapping clustering of a dataset. In such a clustering, each data item is associated with exactly one leaf node and each internal node is associated with all the data items stored in the sub-tree beneath it, so that each level of the hierarchy corresponds to a partition of the dataset. We develop a novel Bayesian nonparametric method combining the nested Chinese Restaurant Process (nCRP) and the Hierarchical Dirichlet Process (HDP). Compared with other existing Bayesian approaches, our solution tackles data with complex latent mixture features which has not been previously explored in the literature. We discuss the details of the model and the inference procedure. Furthermore, experiments on three datasets show that our method achieves solid empirical results in comparison with existing algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05022

PDF

http://arxiv.org/pdf/1905.05022
Read All
Ludii - The ludemic General Game System

2019-05-13

Eric Piette, Dennis J.N.J. Soemers, Matthew Stephenson, Chiara F. Sironi, Mark H.M. Winands, Cameron Browne

arXiv_AI

arXiv_AI
Abstract

While current General Game Playing (GGP) systems facilitate useful research in Artificial Intelligence (AI) for game-playing, they are often somewhat specialized and computationally inefficient. In this paper, we describe an initial version of a “ludemic” general game system called Ludii, which has the potential to provide an efficient tool for AI researchers as well game designers, historians, educators and practitioners in related fields. Ludii defines games as structures of ludemes, i.e. high-level, easily understandable game concepts. We establish the foundations of Ludii by outlining its main benefits: generality, extensibility, understandability and efficiency. Experimentally, Ludii outperforms one of the most efficient Game Description Language (GDL) reasoners, based on a propositional network, for all available games in the Tiltyard GGP repository.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05013

PDF

http://arxiv.org/pdf/1905.05013
Read All
'The cracks that wanted to be a graph': application of image processing and Graph Neural Networks to the description of craquelure patterns

2019-05-13

Oleksii Sidorov, Jon Yngve Hardeberg

arXiv_CV

arXiv_CV GAN Classification Detection
Abstract

Cracks on a painting is not a defect but an inimitable signature of an artwork which can be used for origin examination, aging monitoring, damage identification, and even forgery detection. This work presents the development of a new methodology and corresponding toolbox for the extraction and characterization of information from an image of a craquelure pattern. The proposed approach processes craquelure network as a graph. The graph representation captures the network structure via mutual organization of junctions and fractures. Furthermore, it is invariant to any geometrical distortions. At the same time, our tool extracts the properties of each node and edge individually, which allows to characterize the pattern statistically. We illustrate benefits from the graph representation and statistical features individually using novel Graph Neural Network and hand-crafted descriptors correspondingly. However, we also show that the best performance is achieved when both techniques are merged into one framework. We perform experiments on the dataset for paintings’ origin classification and demonstrate that our approach outperforms existing techniques by a large margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.05010

PDF

http://arxiv.org/pdf/1905.05010
Read All
A ROS multi-ontology references services: OWL reasoners and application prototyping issues

2019-05-13

Luca Buoncompagni, Alessio Capitanelli, Fulvio Mastrogiovanni

arXiv_AI

arXiv_AI Face Ontology
Abstract

This paper introduces a ROS Multi Ontology References (ARMOR) service, a general-purpose and scalable interface between robot architectures and OWL reasoners. ARMOR addresses synchronisation and communication issues among heterogeneous and distributed software components. As a guiding scenario, we consider a prototyping approach for the use of symbolic reasoning in human-robot interaction applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1706.10151

PDF

http://arxiv.org/pdf/1706.10151
Read All
Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks

2019-05-13

Maolin Li, Arvid Fahlström Myrman, Tingting Mu, Sophia Ananiadou

arXiv_CL

arXiv_CL Text_Classification Classification
Abstract

When constructing models that learn from noisy labels produced by multiple annotators, it is important to accurately estimate the reliability of annotators. Annotators may provide labels of inconsistent quality due to their varying expertise and reliability in a domain. Previous studies have mostly focused on estimating each annotator’s overall reliability on the entire annotation task. However, in practice, the reliability of an annotator may depend on each specific instance. Only a limited number of studies have investigated modelling per-instance reliability and these only considered binary labels. In this paper, we propose an unsupervised model which can handle both binary and multi-class labels. It can automatically estimate the per-instance reliability of each annotator and the correct label for each instance. We specify our model as a probabilistic model which incorporates neural networks to model the dependency between latent variables and instances. For evaluation, the proposed method is applied to both synthetic and real data, including two labelling tasks: text classification and textual entailment. Experimental results demonstrate our novel method can not only accurately estimate the reliability of annotators across different instances, but also achieve superior performance in predicting the correct labels and detecting the least reliable annotators compared to state-of-the-art baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04981

PDF

http://arxiv.org/pdf/1905.04981
Read All
Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization

2019-05-13

Aaron Klein, Frank Hutter

arXiv_CV

arXiv_CV Optimization
Abstract

Due to the high computational demands executing a rigorous comparison between hyperparameter optimization (HPO) methods is often cumbersome. The goal of this paper is to facilitate a better empirical evaluation of HPO methods by providing benchmarks that are cheap to evaluate, but still represent realistic use cases. We believe these benchmarks provide an easy and efficient way to conduct reproducible experiments for neural hyperparameter search. Our benchmarks consist of a large grid of configurations of a feed forward neural network on four different regression datasets including architectural hyperparameters and hyperparameters concerning the training pipeline. Based on this data, we performed an in-depth analysis to gain a better understanding of the properties of the optimization problem, as well as of the importance of different types of hyperparameters. Second, we exhaustively compared various different state-of-the-art methods from the hyperparameter optimization literature on these benchmarks in terms of performance and robustness.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.04970

PDF

https://arxiv.org/pdf/1905.04970
Read All
Implicit Filter Sparsification In Convolutional Neural Networks

2019-05-13

Dushyant Mehta, Kwang In Kim, Christian Theobalt

arXiv_CV

arXiv_CV Regularization CNN Gradient_Descent
Abstract

We show implicit filter level sparsity manifests in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained with adaptive gradient descent techniques and L2 regularization or weight decay. Through an extensive empirical study (Mehta et al., 2019) we hypothesize the mechanism behind the sparsification process, and find surprising links to certain filter sparsification heuristics proposed in literature. Emergence of, and the subsequent pruning of selective features is observed to be one of the contributing mechanisms, leading to feature sparsity at par or better than certain explicit sparsification / pruning approaches. In this workshop article we summarize our findings, and point out corollaries of selective-featurepenalization which could also be employed as heuristics for filter pruning

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04967

PDF

http://arxiv.org/pdf/1905.04967
Read All
Deep Visuo-Tactile Learning: Estimation of Tactile Properties from Images

2019-05-13

Kuniyuki Takahashi, Jethro Tan

arXiv_RO

arXiv_RO Face
Abstract

Estimation of tactile properties from vision, such as slipperiness or roughness, is important to effectively interact with the environment. These tactile properties help us decide which actions we should choose and how to perform them. E.g., we can drive slower if we see that we have bad traction or grasp tighter if an item looks slippery. We believe that this ability also helps robots to enhance their understanding of the environment, and thus enables them to tailor their actions to the situation at hand. We therefore propose a model to estimate the degree of tactile properties from visual perception alone (e.g., the level of slipperiness or roughness). Our method extends a encoder-decoder network, in which the latent variables are visual and tactile features. In contrast to previous works, our method does not require manual labeling, but only RGB images and the corresponding tactile sensor data. All our data is collected with a webcam and uSkin tactile sensor mounted on the end-effector of a Sawyer robot, which strokes the surfaces of 25 different materials. We show that our model generalizes to materials not included in the training data by evaluating the feature space, indicating that it has learned to associate important tactile properties with images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.03435

PDF

http://arxiv.org/pdf/1803.03435
Read All
Exogenous Rewards for Promoting Cooperation in Scale-Free Networks

2019-05-13

Theodor Cimpeanu, The Anh Han, Francisco C. Santos

arXiv_AI

arXiv_AI
Abstract

The design of mechanisms that encourage pro-social behaviours in populations of self-regarding agents is recognised as a major theoretical challenge within several areas of social, life and engineering sciences. When interference from external parties is considered, several heuristics have been identified as capable of engineering a desired collective behaviour at a minimal cost. However, these studies neglect the diverse nature of contexts and social structures that characterise real-world populations. Here we analyse the impact of diversity by means of scale-free interaction networks with high and low levels of clustering, and test various interference paradigms using simulations of agents facing a cooperative dilemma. Our results show that interference on scale-free networks is not trivial and that distinct levels of clustering react differently to each interference strategy. As such, we argue that no tailored response fits all scale-free networks and present which strategies are more efficient at fostering cooperation in both types of networks. Finally, we discuss the pitfalls of considering reckless interference strategies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04964

PDF

http://arxiv.org/pdf/1905.04964
Read All
Few-Shot Viewpoint Estimation

2019-05-13

Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Stan Birchfield, Ming-Hsuan Yang, Jan Kautz

arXiv_CV

arXiv_CV
Abstract

Viewpoint estimation for known categories of objects has been improved significantly thanks to deep networks and large datasets, but generalization to unknown categories is still very challenging. With an aim towards improving performance on unknown categories, we introduce the problem of category-level few-shot viewpoint estimation. We design a novel framework to successfully train viewpoint networks for new categories with few examples (10 or less). We formulate the problem as one of learning to estimate category-specific 3D canonical shapes, their associated depth estimates, and semantic 2D keypoints. We apply meta-learning to learn weights for our network that are amenable to category-specific few-shot fine-tuning. Furthermore, we design a flexible meta-Siamese network that maximizes information sharing during meta-learning. Through extensive experimentation on the ObjectNet3D and Pascal3D+ benchmark datasets, we demonstrate that our framework, which we call MetaView, significantly outperforms fine-tuning the state-of-the-art models with few examples, and that the specific architectural innovations of our method are crucial to achieving good performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04957

PDF

http://arxiv.org/pdf/1905.04957
Read All
Over the Sea UAV Based Communication

2019-05-13

Gianluca Fontanesi, Hamed Ahmadi, Anding Zhu

arXiv_CV

arXiv_CV
Abstract

Unmanned Aerial Vehicle (UAV) aided wireless networks have been recently envisioned as a solution to provide a reliable, low latency cellular link for search and rescue operations over the sea. We propose three different network architectures, based on the technology deployed on the UAV: a flying relay, a flying Base Station (BS) and a flying Remote Radio Head (RRH). We describe the challenges and highlight the benefits of the proposed architectures from the perspective of search and rescue operations over the sea. We compare the performance in term of data rate and latency, analyzing different solutions to provide a Backhaul (BH)/Fronthaul (FH) link for long coverage over the sea. Results show that a system architecture is not outperforming over the others. A cost function is thus indicated as a tool to find a suboptimal solution.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.04954

PDF

https://arxiv.org/pdf/1905.04954
Read All
Top-down fabrication of ordered arrays of GaN nanowires by selective area sublimation

2019-05-13

Sergio Fernández-Garrido, Thomas Auzelle, Jonas Lähnemann, Kilian Wimmer, Abbes Tahraoui, Oliver Brandt

arXiv_CV

arXiv_CV GAN Face
Abstract

We demonstrate the top-down fabrication of ordered arrays of GaN nanowires by selective area sublimation of pre-patterned GaN(0001) layers grown by hydride vapor phase epitaxy on Al${2}$O${3}$. Arrays with nanowire diameters and spacings ranging from 50 to 90 nm and 0.1 to 0.7 $\mu$m, respectively, are simultaneously produced under identical conditions. The sublimation process, carried out under high vacuum conditions, is analyzed \emph{in situ} by reflection high-energy electron diffraction and line-of-sight quadrupole mass spectromety. During the sublimation process, the GaN(0001) surface vanishes, giving way to the formation of semi-polar $\lbrace1\bar{1}03\rbrace$ facets which decompose congruently following an Arrhenius temperature dependence with an activation energy of ($3.54 \pm 0.07$) eV and an exponential prefactor of $1.58\times10^{31}$ atoms cm$^{-2}$ s$^{-1}$. The analysis of the samples by low-temperature cathodoluminescence spectroscopy reveals that, in contrast to dry etching, the sublimation process does not introduce nonradiative recombination centers at the nanowire sidewalls. This technique is suitable for the top-down fabrication of a variety of ordered nanostructures, and could possibly be extended to other material systems with similar crystallographic properties such as ZnO.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.04948

PDF

https://arxiv.org/pdf/1905.04948
Read All
Adaptive Confidence Smoothing for Generalized Zero-Shot Learning

2019-05-13

Yuval Atzmon, Gal Chechik

arXiv_CV

arXiv_CV Prediction
Abstract

Generalized zero-shot learning (GZSL) is the problem of learning a classifier where some classes have samples and others are learned from side information, like semantic attributes or text description, in a zero-shot learning fashion (ZSL). Training a single model that operates in these two regimes simultaneously is challenging. Here we describe a probabilistic approach that breaks the model into three modular components, and then combines them in a consistent way. Specifically, our model consists of three classifiers: A “gating” model that makes soft decisions if a sample is from a “seen” class, and two experts: a ZSL expert, and an expert model for seen classes. We address two main difficulties in this approach: How to provide an accurate estimate of the gating probability without any training samples for unseen classes; and how to use expert predictions when it observes samples outside of its domain. The key insight to our approach is to pass information between the three models to improve each one’s accuracy, while maintaining the modular structure. We test our approach, adaptive confidence smoothing (COSMO), on four standard GZSL benchmark datasets and find that it largely outperforms state-of-the-art GZSL models. COSMO is also the first model that closes the gap and surpasses the performance of generative models for GZSL, even-though it is a light-weight model that is much easier to train and tune.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.09903

PDF

http://arxiv.org/pdf/1812.09903
Read All
Lie on the Fly: Strategic Voting in an Iterative Preference Elicitation Process

2019-05-13

Lihi Dery, Svetlana Obraztsova, Zinovi Rabinovich, Meir Kalech

arXiv_AI

arXiv_AI
Abstract

A voting center is in charge of collecting and aggregating voter preferences. In an iterative process, the center sends comparison queries to voters, requesting them to submit their preference between two items. Voters might discuss the candidates among themselves, figuring out during the elicitation process which candidates stand a chance of winning and which do not. Consequently, strategic voters might attempt to manipulate by deviating from their true preferences and instead submit a different response in order to attempt to maximize their profit. We provide a practical algorithm for strategic voters which computes the best manipulative vote and maximizes the voter’s selfish outcome when such a vote exists. We also provide a careful voting center which is aware of the possible manipulations and avoids manipulative queries when possible. In an empirical study on four real-world domains, we show that in practice manipulation occurs in a low percentage of settings and has a low impact on the final outcome. The careful voting center reduces manipulation even further, thus allowing for a non-distorted group decision process to take place. We thus provide a core technology study of a voting process that can be adopted in opinion or information aggregation systems and in crowdsourcing applications, e.g., peer grading in Massive Open Online Courses (MOOCs).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04933

PDF

http://arxiv.org/pdf/1905.04933
Read All
BayesNAS: A Bayesian Approach for Neural Architecture Search

2019-05-13

Hongpeng Zhou, Minghao Yang, Jun Wang, Wei Pan

arXiv_AI

arXiv_AI Sparse NAS CNN
Abstract

One-Shot Neural Architecture Search (NAS) is a promising method to significantly reduce search time without any separate training. It can be treated as a Network Compression problem on the architecture parameters from an over-parameterized network. However, there are two issues associated with most one-shot NAS methods. First, dependencies between a node and its predecessors and successors are often disregarded which result in improper treatment over zero operations. Second, architecture parameters pruning based on their magnitude is questionable. In this paper, we employ the classic Bayesian learning approach to alleviate these two issues by modeling architecture parameters using hierarchical automatic relevance determination (HARD) priors. Unlike other NAS methods, we train the over-parameterized network for only one epoch then update the architecture. Impressively, this enabled us to find the architecture in both proxy and proxyless tasks on CIFAR-10 within only 0.2 GPU days using a single GPU. As a byproduct, our approach can be transferred directly to compress convolutional neural networks by enforcing structural sparsity which achieves extremely sparse networks without accuracy deterioration.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04919

PDF

http://arxiv.org/pdf/1905.04919
Read All
Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs

2019-05-13

Lingbing Guo, Zequn Sun, Wei Hu

arXiv_AI

arXiv_AI Knowledge_Graph Knowledge Embedding RNN Relation
Abstract

We study the problem of knowledge graph (KG) embedding. A widely-established assumption to this problem is that similar entities are likely to have similar relational roles. However, existing related methods derive KG embeddings mainly based on triple-level learning, which lack the capability of capturing long-term relational dependencies of entities. Moreover, triple-level learning is insufficient for the propagation of semantic information among entities, especially for the case of cross-KG embedding. In this paper, we propose recurrent skipping networks (RSNs), which employ a skipping mechanism to bridge the gaps between entities. RSNs integrate recurrent neural networks (RNNs) with residual learning to efficiently capture the long-term relational dependencies within and between KGs. We design an end-to-end framework to support RSNs on different tasks. Our experimental results showed that RSNs outperformed state-of-the-art embedding-based methods for entity alignment and achieved competitive performance for KG completion.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04914

PDF

http://arxiv.org/pdf/1905.04914
Read All
Unsupervised parameter selection for denoising with the elastic net

2019-05-13

Ernesto de Vito, Zeljko Kereta, Valeria Naumova

arXiv_CV

arXiv_CV
Abstract

Despite recent advances in regularisation theory, the issue of parameter selection still remains a challenge for most applications. In a recent work the framework of statistical learning was used to approximate the optimal Tikhonov regularisation parameter from noisy data. In this work, we improve their results and extend the analysis to the elastic net regularisation, providing explicit error bounds on the accuracy of the approximated parameter and the corresponding regularisation solution in a simplified case. Furthermore, in the general case we design a data-driven, automated algorithm for the computation of an approximate regularisation parameter. Our analysis combines statistical learning theory with insights from regularisation theory. We compare our approach with state-of-the-art parameter selection criteria and illustrate its superiority in terms of accuracy and computational time on simulated and real data sets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08696

PDF

http://arxiv.org/pdf/1809.08696
Read All
Automatic Calibration of Multiple 3D LiDARs in Urban Environments

2019-05-13

Jianhao Jiao, Yang Yu, Qinghai Liao, Haoyang Ye, Ming Liu

arXiv_CV

arXiv_CV
Abstract

Multiple LiDARs have progressively emerged on autonomous vehicles for rendering a wide field of view and dense measurements. However, the lack of precise calibration negatively affects their potential applications in localization and perception systems. In this paper, we propose a novel system that enables automatic multi-LiDAR calibration without any calibration target, prior environmental information, and initial values of the extrinsic parameters. Our approach starts with a hand-eye calibration for automatic initialization by aligning the estimated motions of each sensor. The resulting parameters are then refined with an appearance-based method by minimizing a cost function constructed from point-plane correspondences. Experimental results on simulated and real-world data sets demonstrate the reliability and accuracy of our calibration approach. The proposed approach can calibrate a multi-LiDAR system with the rotation and translation errors less than 0.04 [rad] and 0.1 [m] respectively for a mobile platform.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04912

PDF

http://arxiv.org/pdf/1905.04912
Read All
Zero-Shot Anticipation for Instructional Activities

2019-05-13

Fadime Sener, Angela Yao

arXiv_CV

arXiv_CV Knowledge Recognition
Abstract

How can we teach a robot to predict what will happen next for an activity it has never seen before? We address this problem of zero-shot anticipation by presenting a hierarchical model that generalizes instructional knowledge from large-scale text-corpora and transfers the knowledge to the visual domain. Given a portion of an instructional video, our model predicts coherent and plausible actions multiple steps into the future, all in rich natural language. To demonstrate the anticipation capabilities of our model, we introduce the Tasty Videos dataset, a collection of 2511 recipes for zero-shot learning, recognition and anticipation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.02501

PDF

http://arxiv.org/pdf/1812.02501
Read All
Dual Reweighted Lp-Norm Minimization for Salt-and-pepper Noise Removal

2019-05-13

Huiwen Dong, Jing Yu, Chuangbai Xiao

arXiv_CV

arXiv_CV Sparse Quantitative
Abstract

The robust principle analysis (RPCA), which aims to estimate underlying low rank and sparse structures from the degraded observation data, has a wide range of applications in computer vision. It is usually replaced by the component analysis model (PCP) in order to pursue the convex property, leading to the undesirable overshrink problem. In this paper, we propose a dual reweighted Lp-norm (DWLP) model with a more reasonable weighting rule and weaker powers, which greatly generalizes previous works and provides a better approximation to the rank minimization problem for original matrix as well as the L0-norm minimization problem for sparse noise. Moreover, an iterative reweighted algorithm is introduced to solve the proposed DWLP model by optimizing elements and weights alternatively. We then apply the DWLP model to remove salt-and-pepper noise by exploiting the image non-local self-similarity. Extensive experiments demonstrate that the proposed method outperforms other state-of-the-art methods in terms of both qualitative and quantitative evaluation. More precisely, our DWLP achieves about 6.814dB, 4.80dB, 3.142dB, 1.20d-B and 0.1dB improvements over the current WSNM-RPCA in average under salt-and-pepper noise densities 10% to 50% with an interval 10% respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.09173

PDF

http://arxiv.org/pdf/1811.09173
Read All
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

2019-05-13

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo

arXiv_CV

arXiv_CV Image_Caption Regularization Caption CNN Classification Detection
Abstract

Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (\eg leg as opposed to head of a person), thereby letting the network generalize better and have better object localization capabilities. On the other hand, current methods for regional dropout removes informative pixels on training images by overlaying a patch of either black pixels or random noise. {Such removal is not desirable because it leads to information loss and inefficiency during training.} We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and \mbox{retaining} the regularization effect of regional dropout, CutMix consistently outperforms the state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on the ImageNet weakly-supervised localization task. Moreover, unlike previous augmentation methods, our CutMix-trained ImageNet classifier, when used as a pretrained model, results in consistent performance gains in Pascal detection and MS-COCO image captioning benchmarks. We also show that CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04899

PDF

http://arxiv.org/pdf/1905.04899
Read All
FPGA-based Binocular Image Feature Extraction and Matching System

2019-05-13

Qi Ni, Fei Wang, Ziwei Zhao, Peng Gao

arXiv_CV

arXiv_CV Tracking Detection
Abstract

Image feature extraction and matching is a fundamental but computation intensive task in machine vision. This paper proposes a novel FPGA-based embedded system to accelerate feature extraction and matching. It implements SURF feature point detection and BRIEF feature descriptor construction and matching. For binocular stereo vision, feature matching includes both tracking matching and stereo matching, which simultaneously provide feature point correspondences and parallax information. Our system is evaluated on a ZYNQ XC7Z045 FPGA. The result demonstrates that it can process binocular video data at a high frame rate (640$\times$480 @ 162fps). Moreover, an extensive test proves our system has robustness for image compression, blurring and illumination.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04890

PDF

http://arxiv.org/pdf/1905.04890
Read All
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering

2019-05-13

Yangyang Guo, Zhiyong Cheng, Liqiang Nie, Yibing Liu, Yinglong Wang, Mohan Kankanhalli

arXiv_CV

arXiv_CV Regularization QA Attention Quantitative VQA
Abstract

Benefiting from the advancement of computer vision, natural language processing and information retrieval techniques, visual question answering (VQA), which aims to answer questions about an image or a video, has received lots of attentions over the past few years. Although some progress has been achieved so far, several studies have pointed out that current VQA models are heavily affected by the \emph{language prior problem}, which means they tend to answer questions based on the co-occurrence patterns of question keywords (e.g., how many) and answers (e.g., 2) instead of understanding images and questions. Existing methods attempt to solve this problem by either balancing the biased datasets or forcing models to better understand images. However, only marginal effects and even performance deterioration are observed for the first and second solution, respectively. In addition, another important issue is the lack of measurement to quantitatively measure the extent of the language prior effect, which severely hinders the advancement of related techniques. In this paper, we make contributions to solve the above problems from two perspectives. Firstly, we design a metric to quantitatively measure the language prior effect of VQA models. The proposed metric has been demonstrated to be effective in our empirical studies. Secondly, we propose a regularization method (i.e., score regularization module) to enhance current VQA models by alleviating the language prior problem as well as boosting the backbone model performance. The proposed score regularization module adopts a pair-wise learning strategy, which makes the VQA models answer the question based on the reasoning of the image (upon this question) instead of basing on question-answer patterns observed in the biased training set. The score regularization module is flexible to be integrated into various VQA models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04877

PDF

http://arxiv.org/pdf/1905.04877
Read All
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

2019-05-13

Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin

arXiv_SD

arXiv_SD Adversarial GAN Optimization
Abstract

Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores. To overcome this issue, we propose a novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics. Moreover, based on MetricGAN, the metric scores of the generated data can also be arbitrarily specified by users. We tested the proposed MetricGAN on a speech enhancement task, which is particularly suitable to verify the proposed approach because there are multiple metrics measuring different aspects of speech signals. Moreover, these metrics are generally complex and could not be fully optimized by Lp or conventional adversarial losses.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04874

PDF

http://arxiv.org/pdf/1905.04874
Read All
Similarity Grouping-Guided Neural Network Modeling for Maritime Time Series Prediction

2019-05-13

Yan Li, Ryan Wen Liu, Zhao Liu, Jingxian Liu

arXiv_AI

arXiv_AI Prediction
Abstract

Reliable and accurate prediction of time series plays a crucial role in maritime industry, such as economic investment, transportation planning, port planning and design, etc. The dynamic growth of maritime time series has the predominantly complex, nonlinear and non-stationary properties. To guarantee high-quality prediction performance, we propose to first adopt the empirical mode decomposition (EMD) and ensemble EMD (EEMD) methods to decompose the original time series into high- and low-frequency components. The low-frequency components can be easily predicted directly through traditional neural network (NN) methods. It is more difficult to predict high-frequency components due to their properties of weak mathematical regularity. To take advantage of the inherent self-similarities within high-frequency components, these components will be divided into several continuous small (overlapping) segments. The grouped segments with high similarities are then selected to form more proper training datasets for traditional NN methods. This regrouping strategy can assist in enhancing the prediction accuracy of high-frequency components. The final prediction result is obtained by integrating the predicted high- and low-frequency components. Our proposed three-step prediction frameworks benefit from the time series decomposition and similar segments grouping. Experiments on both port cargo throughput and vessel traffic flow have illustrated its superior performance in terms of prediction accuracy and robustness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04872

PDF

http://arxiv.org/pdf/1905.04872
Read All
CoT: Cooperative Training for Generative Modeling of Discrete Data

2019-05-13

Sidi Lu, Lantao Yu, Siyuan Feng, Yaoming Zhu, Weinan Zhang, Yong Yu

arXiv_AI

arXiv_AI Adversarial GAN
Abstract

In this paper, we study the generative models of sequential discrete data. To tackle the exposure bias problem inherent in maximum likelihood estimation (MLE), generative adversarial networks (GANs) are introduced to penalize the unrealistic generated samples. To exploit the supervision signal from the discriminator, most previous models leverage REINFORCE to address the non-differentiable problem of sequential discrete data. However, because of the unstable property of the training signal during the dynamic process of adversarial training, the effectiveness of REINFORCE, in this case, is hardly guaranteed. To deal with such a problem, we propose a novel approach called Cooperative Training (CoT) to improve the training of sequence generative models. CoT transforms the min-max game of GANs into a joint maximization framework and manages to explicitly estimate and optimize Jensen-Shannon divergence. Moreover, CoT works without the necessity of pre-training via MLE, which is crucial to the success of previous methods. In the experiments, compared to existing state-of-the-art methods, CoT shows superior or at least competitive performance on sample quality, diversity, as well as training stability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.03782

PDF

http://arxiv.org/pdf/1804.03782
Read All
Group Re-identification via Transferred Single and Couple Representation Learning

2019-05-13

Ziling Huang, Zheng Wang, Shin'ichi Satoh, Chia-Wen Lin

arXiv_CV

arXiv_CV Re-identification Person_Re-identification Represenation_Learning Deep_Learning Relation
Abstract

Group re-identification (G-ReID) is an important yet less-studied task. Its challenges not only lie in appearance changes of individuals which have been well-investigated in general person re-identification (ReID), but also derive from group layout and membership changes. So the key task of G-ReID is to learn representations robust to such changes. To address this issue, we propose a Transferred Single and Couple Representation Learning Network (TSCN). Its merits are two aspects: 1) Due to the lack of labelled training samples, existing G-ReID methods mainly rely on unsatisfactory hand-crafted features. To gain the superiority of deep learning models, we treat a group as multiple persons and transfer the domain of a labeled ReID dataset to a G-ReID target dataset style to learn single representations. 2) Taking into account the neighborhood relationship in a group, we further propose learning a novel couple representation between two group members, that achieves more discriminative power in G-ReID tasks. In addition, an unsupervised weight learning method is exploited to adaptively fuse the results of different views together according to result patterns. Extensive experimental results demonstrate the effectiveness of our approach that significantly outperforms state-of-the-art methods by 11.7\% CMC-1 on the Road Group dataset and by 39.0\% CMC-1 on the DukeMCMT dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04854

PDF

http://arxiv.org/pdf/1905.04854
Read All
ISBNet: Instance-aware Selective Branching Network

2019-05-13

Shaofeng Cai, Yao Shu, Wei Wang, Beng Chin Ooi

arXiv_AI

arXiv_AI NAS Inference
Abstract

Recent years have witnessed growing interests in designing efficient neural networks and neural architecture search (NAS). Although remarkable efficiency and accuracy have been achieved, existing expert designed and NAS models neglect that input instances are of varying complexity thus different amount of computation is required. Therefore, inference with a fixed model that processes all instances through the same transformations would waste plenty of computational resources. Customizing the model capacity in an instance-aware manner is highly demanded. In this paper, we introduce a novel network ISBNet to address this issue, which supports efficient instance-level inference by selectively bypassing transformation branches of infinitesimal importance weight. We also propose lightweight hypernetworks SelectionNet to generate these importance weights instance-wisely. Extensive experiments have been conducted to evaluate the efficiency of ISBNet and the results show that ISBNet achieves extremely efficient inference comparing to existing networks. For example, ISBNet takes only 12.45% parameters and 45.79% FLOPs of the state-of-the-art efficient network ShuffleNetV2 with comparable accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04849

PDF

http://arxiv.org/pdf/1905.04849
Read All

30/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL