Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

IAN: Combining Generative Adversarial Networks for Imaginative Face Generation

2019-04-16

Abdullah Hamdi, Bernard Ghanem

arXiv_CV

arXiv_CV Adversarial Knowledge GAN Face
Abstract

Generative Adversarial Networks (GANs) have gained momentum for their ability to model image distributions. They learn to emulate the training set and that enables sampling from that domain and using the knowledge learned for useful applications. Several methods proposed enhancing GANs, including regularizing the loss with some feature matching. We seek to push GANs beyond the data in the training and try to explore unseen territory in the image manifold. We first propose a new regularizer for GAN based on K-nearest neighbor (K-NN) selective feature matching to a target set Y in high-level feature space, during the adversarial training of GAN on the base set X, and we call this novel model K-GAN. We show that minimizing the added term follows from cross-entropy minimization between the distributions of GAN and the set Y. Then, We introduce a cascaded framework for GANs that try to address the task of imagining a new distribution that combines the base set X and target set Y by cascading sampling GANs with translation GANs, and we dub the cascade of such GANs as the Imaginative Adversarial Network (IAN). We conduct an objective and subjective evaluation for different IAN setups in the addressed task and show some useful applications for these IANs, like manifold traversing and creative face generation for characters’ design in movies or video games.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07916

PDF

http://arxiv.org/pdf/1904.07916
Read All
REPAIR: Removing Representation Bias by Dataset Resampling

2019-04-16

Yi Li, Nuno Vasconcelos

arXiv_CV

arXiv_CV Action_Recognition Optimization Classification Gradient_Descent Recognition
Abstract

Modern machine learning datasets can have biases for certain representations that are leveraged by algorithms to achieve high performance without learning to solve the underlying task. This problem is referred to as “representation bias”. The question of how to reduce the representation biases of a dataset is investigated and a new dataset REPresentAtion bIas Removal (REPAIR) procedure is proposed. This formulates bias minimization as an optimization problem, seeking a weight distribution that penalizes examples easy for a classifier built on a given feature representation. Bias reduction is then equated to maximizing the ratio between the classification loss on the reweighted dataset and the uncertainty of the ground-truth class labels. This is a minimax problem that REPAIR solves by alternatingly updating classifier parameters and dataset resampling weights, using stochastic gradient descent. An experimental set-up is also introduced to measure the bias of any dataset for a given representation, and the impact of this bias on the performance of recognition models. Experiments with synthetic and action recognition data show that dataset REPAIR can significantly reduce representation bias, and lead to improved generalization of models trained on REPAIRed datasets. The tools used for characterizing representation bias, and the proposed dataset REPAIR algorithm, are available at https://github.com/JerryYLi/Dataset-REPAIR/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07911

PDF

http://arxiv.org/pdf/1904.07911
Read All
Feature Space Transfer for Data Augmentation

2019-04-16

Bo Liu, Xudong Wang, Mandar Dixit, Roland Kwitt, Nuno Vasconcelos

arXiv_CV

arXiv_CV Transfer_Learning Recognition
Abstract

The problem of data augmentation in feature space is considered. A new architecture, denoted the FeATure TransfEr Network (FATTEN), is proposed for the modeling of feature trajectories induced by variations of object pose. This architecture exploits a parametrization of the pose manifold in terms of pose and appearance. This leads to a deep encoder/decoder network architecture, where the encoder factors into an appearance and a pose predictor. Unlike previous attempts at trajectory transfer, FATTEN can be efficiently trained end-to-end, with no need to train separate feature transfer functions. This is realized by supplying the decoder with information about a target pose and the use of a multi-task loss that penalizes category- and pose-mismatches. In result, FATTEN discourages discontinuous or non-smooth trajectories that fail to capture the structure of the pose manifold, and generalizes well on object recognition tasks involving large pose variation. Experimental results on the artificial ModelNet database show that it can successfully learn to map source features to target features of a desired pose, while preserving class identity. Most notably, by using feature space transfer for data augmentation (w.r.t. pose and depth) on SUN-RGBD objects, we demonstrate considerable performance improvements on one/few-shot object recognition in a transfer learning setup, compared to current state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.04356

PDF

http://arxiv.org/pdf/1801.04356
Read All
Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation

2019-04-16

Chia-Hsuan Lee, Yun-Nung Chen, Hung-Yi Lee

arXiv_CL

arXiv_CL Adversarial QA Speech_Recognition Recognition
Abstract

Spoken question answering (SQA) is challenging due to complex reasoning on top of the spoken documents. The recent studies have also shown the catastrophic impact of automatic speech recognition (ASR) errors on SQA. Therefore, this work proposes to mitigate the ASR errors by aligning the mismatch between ASR hypotheses and their corresponding reference transcriptions. An adversarial model is applied to this domain adaptation task, which forces the model to learn domain-invariant features the QA model can effectively utilize in order to improve the SQA results. The experiments successfully demonstrate the effectiveness of our proposed model, and the results are better than the previous best model by 2% EM score.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07904

PDF

http://arxiv.org/pdf/1904.07904
Read All
Histopathologic Image Processing: A Review

2019-04-16

Jonathan de Matos, Alceu de Souza Britto Jr., Luiz E. S. Oliveira, Alessandro L. Koerich

arXiv_CV

arXiv_CV Review Segmentation Classification
Abstract

Histopathologic Images (HI) are the gold standard for evaluation of some tumors. However, the analysis of such images is challenging even for experienced pathologists, resulting in problems of inter and intra observer. Besides that, the analysis is time and resource consuming. One of the ways to accelerate such an analysis is by using Computer Aided Diagnosis systems. In this work we present a literature review about the computing techniques to process HI, including shallow and deep methods. We cover the most common tasks for processing HI such as segmentation, feature extraction, unsupervised learning and supervised learning. A dataset section show some datasets found during the literature review. We also bring a study case of breast cancer classification using a mix of deep and shallow machine learning methods. The proposed method obtained an accuracy of 91% in the best case, outperforming the compared baseline of the dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07900

PDF

http://arxiv.org/pdf/1904.07900
Read All
End-to-End Robotic Reinforcement Learning without Reward Engineering

2019-04-16

Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine

arXiv_CV

arXiv_CV Reinforcement_Learning
Abstract

The combination of deep neural network models and reinforcement learning algorithms can make it possible to learn policies for robotic behaviors that directly read in raw sensory inputs, such as camera images, effectively subsuming both estimation and control into one model. However, real-world applications of reinforcement learning must specify the goal of the task by means of a manually programmed reward function, which in practice requires either designing the very same perception pipeline that end-to-end reinforcement learning promises to avoid, or else instrumenting the environment with additional sensors to determine if the task has been performed successfully. In this paper, we propose an approach for removing the need for manual engineering of reward specifications by enabling a robot to learn from a modest number of examples of successful outcomes, followed by actively solicited queries, where the robot shows the user a state and asks for a label to determine whether that state represents successful completion of the task. While requesting labels for every single state would amount to asking the user to manually provide the reward signal, our method requires labels for only a tiny fraction of the states seen during training, making it an efficient and practical approach for learning skills without manually engineered rewards. We evaluate our method on real-world robotic manipulation tasks where the observations consist of images viewed by the robot’s camera. In our experiments, our method effectively learns to arrange objects, place books, and drape cloth, directly from images and without any manually specified reward functions, and with only 1-4 hours of interaction with the real world.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07854

PDF

http://arxiv.org/pdf/1904.07854
Read All
Matrix and tensor decompositions for training binary neural networks

2019-04-16

Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

arXiv_AI

arXiv_AI Pose_Estimation CNN Inference Classification
Abstract

This paper is on improving the training of binary neural networks in which both activations and weights are binary. While prior methods for neural network binarization binarize each filter independently, we propose to instead parametrize the weight tensor of each layer using matrix or tensor decomposition. The binarization process is then performed using this latent parametrization, via a quantization function (e.g. sign function) applied to the reconstructed weights. A key feature of our method is that while the reconstruction is binarized, the computation in the latent factorized space is done in the real domain. This has several advantages: (i) the latent factorization enforces a coupling of the filters before binarization, which significantly improves the accuracy of the trained models. (ii) while at training time, the binary weights of each convolutional layer are parametrized using real-valued matrix or tensor decomposition, during inference we simply use the reconstructed (binary) weights. As a result, our method does not sacrifice any advantage of binary networks in terms of model compression and speeding-up inference. As a further contribution, instead of computing the binary weight scaling factors analytically, as in prior work, we propose to learn them discriminatively via back-propagation. Finally, we show that our approach significantly outperforms existing methods when tested on the challenging tasks of (a) human pose estimation (more than 4% improvements) and (b) ImageNet classification (up to 5% performance gains).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07852

PDF

http://arxiv.org/pdf/1904.07852
Read All
Objects as Points

2019-04-16

Xingyi Zhou, Dequan Wang, Philipp Krähenbühl

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point — the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07850

PDF

http://arxiv.org/pdf/1904.07850
Read All
Active Adversarial Domain Adaptation

2019-04-16

Jong-Chyi Su, Yi-Hsuan Tsai, Kihyuk Sohn, Buyu Liu, Subhransu Maji, Manmohan Chandraker

arXiv_CV

arXiv_CV Adversarial Object_Detection Transfer_Learning Classification Detection
Abstract

We propose an active learning approach for transferring representations across domains. Our approach, active adversarial domain adaptation (AADA), explores a duality between two related problems: adversarial domain alignment and importance sampling for adapting models across domains. The former uses a domain discriminative model to align domains, while the latter utilizes it to weigh samples to account for distribution shifts. Specifically, our importance weight promotes samples with large uncertainty in classification and diversity from labeled examples, thus serves as a sample selection scheme for active learning. We show that these two views can be unified in one framework for domain adaptation and transfer learning when the source domain has many labeled examples while the target domain does not. AADA provides significant improvements over fine-tuning based approaches and other sampling methods when the two domains are closely related. Results on challenging domain adaptation tasks, e.g., object detection, demonstrate that the advantage over baseline approaches is retained even after hundreds of examples being actively annotated.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07848

PDF

http://arxiv.org/pdf/1904.07848
Read All
Temporal Cycle-Consistency Learning

2019-04-16

Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

arXiv_CV

arXiv_CV Embedding Represenation_Learning Classification Detection
Abstract

We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be used to find correspondences across time in multiple videos. The resulting per-frame embeddings can be used to align videos by simply matching frames using the nearest-neighbors in the learned embedding space. To evaluate the power of the embeddings, we densely label the Pouring and Penn Action video datasets for action phases. We show that (i) the learned embeddings enable few-shot classification of these action phases, significantly reducing the supervised training requirements; and (ii) TCC is complementary to other methods of self-supervised learning in videos, such as Shuffle and Learn and Time-Contrastive Networks. The embeddings are also used for a number of applications based on alignment (dense temporal correspondence) between video pairs, including transfer of metadata of synchronized modalities between videos (sounds, temporal semantic labels), synchronized playback of multiple videos, and anomaly detection. Project webpage: https://sites.google.com/view/temporal-cycle-consistency .

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07846

PDF

http://arxiv.org/pdf/1904.07846
Read All
Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering

2019-04-16

Gene-Ping Yang, Chao-I Tuan, Hung-Yi Lee, Lin-shan Lee

arXiv_SD

arXiv_SD Knowledge Embedding Deep_Learning
Abstract

Speech separation has been very successful with deep learning techniques. Substantial effort has been reported based on approaches over spectrogram, which is well known as the standard time-and-frequency cross-domain representation for speech signals. It is highly correlated to the phonetic structure of speech, or “how the speech sounds” when perceived by human, but primarily frequency domain features carrying temporal behaviour. Very impressive work achieving speech separation over time domain was reported recently, probably because waveforms in time domain may describe the different realizations of speech in a more precise way than spectrogram. In this paper, we propose a framework properly integrating the above two directions, hoping to achieve both purposes. We construct a time-and-frequency feature map by concatenating the 1-dim convolution encoded feature map (for time domain) and the spectrogram (for frequency domain), which was then processed by an embedding network and clustering approaches very similar to those used in time and frequency domain prior works. In this way, the information in the time and frequency domains, as well as the interactions between them, can be jointly considered during embedding and clustering. Very encouraging results (state-of-the-art to our knowledge) were obtained with WSJ0-2mix dataset in preliminary experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07845

PDF

http://arxiv.org/pdf/1904.07845
Read All
UTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks

2019-04-16

Gustavo Henrique Paetzold, Shervin Malmasi, Marcos Zampieri

arXiv_CL

arXiv_CL GAN RNN Detection
Abstract

In this paper we revisit the problem of automatically identifying hate speech in posts from social media. We approach the task using a system based on minimalistic compositional Recurrent Neural Networks (RNN). We tested our approach on the SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (HatEval) shared task dataset. The dataset made available by the HatEval organizers contained English and Spanish posts retrieved from Twitter annotated with respect to the presence of hateful content and its target. In this paper we present the results obtained by our system in comparison to the other entries in the shared task. Our system achieved competitive performance ranking 7th in sub-task A out of 62 systems in the English track.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07839

PDF

http://arxiv.org/pdf/1904.07839
Read All
Predicting GNSS satellite visibility from densepoint clouds

2019-04-16

Philippe Dandurand, Philippe Babin, Vladimır Kubelka, Philippe Giguère, François Pomerleau

arXiv_RO

arXiv_RO
Abstract

To help future mobile agents plan their movement in harsh environments, a predictive model has been designed to determine what areas would be favorable for \ac{GNSS} positioning. The model is able to predict the number of viable satellites for a \ac{GNSS} receiver, based on a 3D point cloud map and a satellite constellation. Both occlusion and absorption effects of the environment are considered. A rugged mobile platform was designed to collect data in order to generate the point cloud maps. It was deployed during the Canadian winter known for large amounts of snow and extremely low temperatures. The test environments include a highly dense boreal forest and a university campus with high buildings. The experiment results indicate that the model performs well in both structured and unstructured environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07837

PDF

http://arxiv.org/pdf/1904.07837
Read All
Double Transfer Learning for Breast Cancer Histopathologic Image Classification

2019-04-16

Jonathan de Matos, Alceu de S. Britto Jr., Luiz E. S. Oliveira, Alessandro L. Koerich

arXiv_CV

arXiv_CV Image_Classification Transfer_Learning Classification
Abstract

This work proposes a classification approach for breast cancer histopathologic images (HI) that uses transfer learning to extract features from HI using an Inception-v3 CNN pre-trained with ImageNet dataset. We also use transfer learning on training a support vector machine (SVM) classifier on a tissue labeled colorectal cancer dataset aiming to filter the patches from a breast cancer HI and remove the irrelevant ones. We show that removing irrelevant patches before training a second SVM classifier, improves the accuracy for classifying malign and benign tumors on breast cancer images. We are able to improve the classification accuracy in 3.7% using the feature extraction transfer learning and an additional 0.7% using the irrelevant patch elimination. The proposed approach outperforms the state-of-the-art in three out of the four magnification factors of the breast cancer dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07834

PDF

http://arxiv.org/pdf/1904.07834
Read All
An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression

2019-04-16

Sandip Modha, Prasenjit Majumder

arXiv_CL

arXiv_CL GAN Embedding Transfer_Learning Classification Language_Model Detection
Abstract

This paper attempt to study the effectiveness of text representation schemes on two tasks namely: User Aggression and Fact Detection from the social media contents. In User Aggression detection, The aim is to identify the level of aggression from the contents generated in the Social media and written in the English, Devanagari Hindi and Romanized Hindi. Aggression levels are categorized into three predefined classes namely: Non-aggressive, Overtly Aggressive, and Covertly Aggressive. During the disaster-related incident, Social media like, Twitter is flooded with millions of posts. In such emergency situations, identification of factual posts is important for organizations involved in the relief operation. We anticipated this problem as a combination of classification and Ranking problem. This paper presents a comparison of various text representation scheme based on BoW techniques, distributed word/sentence representation, transfer learning on classifiers. Weighted $F_1$ score is used as a primary evaluation metric. Results show that text representation using BoW performs better than word embedding on machine learning classifiers. While pre-trained Word embedding techniques perform better on classifiers based on deep neural net. Recent transfer learning model like ELMO, ULMFiT are fine-tuned for the Aggression classification task. However, results are not at par with pre-trained word embedding model. Overall, word embedding using fastText produce best weighted $F_1$-score than Word2Vec and Glove. Results are further improved using pre-trained vector model. Statistical significance tests are employed to ensure the significance of the classification results. In the case of lexically different test Dataset, other than training Dataset, deep neural models are more robust and perform substantially better than machine learning classifiers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08770

PDF

http://arxiv.org/pdf/1904.08770
Read All
Unsupervised Discovery of Multimodal Links in Multi-Image, Multi-Sentence Documents

2019-04-16

Jack Hessel, Lillian Lee, David Mimno

arXiv_CV

arXiv_CV Caption Relation
Abstract

Images and text co-occur everywhere on the web, but explicit links between images and sentences (or other intra-document textual units) are often not annotated by users. We present algorithms that successfully discover image-sentence relationships without relying on any explicit multimodal annotation. We explore several variants of our approach on seven datasets of varying difficulty, ranging from images that were captioned post hoc by crowd-workers to naturally-occurring user-generated multimodal documents, wherein correspondences between illustrations and individual textual units may not be one-to-one. We find that a structured training objective based on identifying whether sets of images and sentences co-occur in documents can be sufficient to predict links between specific sentences and specific images within the same document at test time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07826

PDF

http://arxiv.org/pdf/1904.07826
Read All
Usage of Decision Support Systems for Conflicts Modelling during Information Operations Recognition

2019-04-16

Oleh Andriichuk, Vitaliy Tsyganok, Dmitry Lande, Oleg Chertov, Yaroslava Porplenko

arXiv_AI

arXiv_AI Knowledge Recognition
Abstract

Application of decision support systems for conflict modeling in information operations recognition is presented. An information operation is considered as a complex weakly structured system. The model of conflict between two subjects is proposed based on the second-order rank reflexive model. The method is described for construction of the design pattern for knowledge bases of decision support systems. In the talk, the methodology is proposed for using of decision support systems for modeling of conflicts in information operations recognition based on the use of expert knowledge and content monitoring.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08303

PDF

http://arxiv.org/pdf/1904.08303
Read All
Simion Zoo: A Workbench for Distributed Experimentation with Reinforcement Learning for Continuous Control Tasks

2019-04-16

Borja Fernandez-Gauna, Manuel Graña, Roland S. Zimmermann

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We present Simion Zoo, a Reinforcement Learning (RL) workbench that provides a complete set of tools to design, run, and analyze the results,both statistically and visually, of RL control applications. The main features that set apart Simion Zoo from similar software packages are its easy-to-use GUI, its support for distributed execution including deployment over graphics processing units (GPUs) , and the possibility to explore concurrently the RL metaparameter space, which is key to successful RL experimentation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07817

PDF

http://arxiv.org/pdf/1904.07817
Read All
Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples

2019-04-16

Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang, Wujie Wen

arXiv_CV

arXiv_CV Adversarial Classification
Abstract

Image compression-based approaches for defending against the adversarial-example attacks, which threaten the safety use of deep neural networks (DNN), have been investigated recently. However, prior works mainly rely on directly tuning parameters like compression rate, to blindly reduce image features, thereby lacking guarantee on both defense efficiency (i.e. accuracy of polluted images) and classification accuracy of benign images, after applying defense methods. To overcome these limitations, we propose a JPEG-based defensive compression framework, namely “feature distillation”, to effectively rectify adversarial examples without impacting classification accuracy on benign data. Our framework significantly escalates the defense efficiency with marginal accuracy reduction using a two-step method: First, we maximize malicious features filtering of adversarial input perturbations by developing defensive quantization in frequency domain of JPEG compression or decompression, guided by a semi-analytical method; Second, we suppress the distortions of benign features to restore classification accuracy through a DNN-oriented quantization refine process. Our experimental results show that proposed “feature distillation” can significantly surpass the latest input-transformation based mitigations such as Quilting and TV Minimization in three aspects, including defense efficiency (improve classification accuracy from $\sim20\%$ to $\sim90\%$ on adversarial examples), accuracy of benign images after defense ($\le1\%$ accuracy degradation), and processing time per image ($\sim259\times$ Speedup). Moreover, our solution can also provide the best defense efficiency ($\sim60\%$ accuracy) against the recent adaptive attack with least accuracy reduction ($\sim1\%$) on benign images when compared with other input-transformation based defense methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.05787

PDF

http://arxiv.org/pdf/1803.05787
Read All
Large-scale 3D Mapping of Sub-arctic Forests

2019-04-16

Philippe Babin, Philippe Dandurand, Vladimír Kubelka, Philippe Giguère, François Pomerleau

arXiv_RO

arXiv_RO Face Embedding Optimization SLAM
Abstract

The ability to map challenging sub-arctic environments opens new horizons for robotic deployments in industries such as forestry, surveillance, and open-pit mining. In this paper, we explore possibilities of large-scale lidar mapping in a boreal forest. Computational and sensory requirements with regards to contemporary hardware are considered as well. The lidar mapping is often based on the SLAM technique relying on pose graph optimization, which fuses the Iterative Closest Point (ICP) algorithm, Global Navigation Satellite System (GNSS) positioning, and Inertial Measurement Unit (IMU) measurements. To handle those sensors directly within the ICP minimization process, we propose an alternative approach of embedding external constraints. Furthermore, a novel formulation of a cost function is presented and cast into the problem of handling uncertainties from GNSS and lidar points. To test our approach, we acquired a large-scale dataset in the Foret Montmorency research forest. We report on the technical problems faced during our winter deployments aiming at building 3D maps using our new cost function. Those maps demonstrate both global and local consistency over 4.1km.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07814

PDF

http://arxiv.org/pdf/1904.07814
Read All
Learning 3D Navigation Protocols on Touch Interfaces with Cooperative Multi-Agent Reinforcement Learning

2019-04-16

Quentin Debard, Jilles Steeve Dibangoye, Stéphane Canu, Christian Wolf

arXiv_AI

arXiv_AI Face Reinforcement_Learning Represenation_Learning
Abstract

Using touch devices to navigate in virtual 3D environments such as computer assisted design (CAD) models or geographical information systems (GIS) is inherently difficult for humans, as the 3D operations have to be performed by the user on a 2D touch surface. This ill-posed problem is classically solved with a fixed and handcrafted interaction protocol, which must be learned by the user. We propose to automatically learn a new interaction protocol allowing to map a 2D user input to 3D actions in virtual environments using reinforcement learning (RL). A fundamental problem of RL methods is the vast amount of interactions often required, which are difficult to come by when humans are involved. To overcome this limitation, we make use of two collaborative agents. The first agent models the human by learning to perform the 2D finger trajectories. The second agent acts as the interaction protocol, interpreting and translating to 3D operations the 2D finger trajectories from the first agent. We restrict the learned 2D trajectories to be similar to a training set of collected human gestures by first performing state representation learning, prior to reinforcement learning. This state representation learning is addressed by projecting the gestures into a latent space learned by a variational auto encoder (VAE).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07802

PDF

http://arxiv.org/pdf/1904.07802
Read All
Visual Relationship Detection with Language prior and Softmax

2019-04-16

Jaewon Jung, Jongyoul Park

arXiv_CV

arXiv_CV Image_Caption Knowledge Detection Relation
Abstract

Visual relationship detection is an intermediate image understanding task that detects two objects and classifies a predicate that explains the relationship between two objects in an image. The three components are linguistically and visually correlated (e.g. “wear” is related to “person” and “shirt”, while “laptop” is related to “table” and “on”) thus, the solution space is huge because there are many possible cases between them. Language and visual modules are exploited and a sophisticated spatial vector is proposed. The models in this work outperformed the state of arts without costly linguistic knowledge distillation from a large text corpus and building complex loss functions. All experiments were only evaluated on Visual Relationship Detection and Visual Genome dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07798

PDF

http://arxiv.org/pdf/1904.07798
Read All
AT-GAN: A Generative Attack Model for Adversarial Transferring on Generative Adversarial Nets

2019-04-16

Xiaosen Wang, Kun He, Chuan Guo, Kilian Q. Weinberger, John E. Hopcroft

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Recent studies have discovered the vulnerability of Deep Neural Networks (DNNs) to adversarial examples, which are imperceptible to humans but can easily fool DNNs. Existing methods for crafting adversarial examples are mainly based on adding small-magnitude perturbations to the original images so that the generated adversarial examples are constrained by the benign examples within a small matrix norm. In this work, we propose a new attack method called AT-GAN that directly generates the adversarial examples from random noise using generative adversarial nets (GANs). The key idea is to transfer a pre-trained GAN to generate adversarial examples for the target classifier to be attacked. Once the model is transferred for attack, AT-GAN can generate diverse adversarial examples efficiently, making it helpful to potentially accelerate the adversarial training on defenses. We evaluate AT-GAN in both semi-whitebox and black-box settings under typical defense methods on the MNIST handwritten digit database. Empirical comparisons with existing attack baselines demonstrate that AT-GAN can achieve a higher attack success rate.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07793

PDF

http://arxiv.org/pdf/1904.07793
Read All
Processing-In-Memory Acceleration of Convolutional Neural Networks for Energy-Efficiency, and Power-Intermittency Resilience

2019-04-16

Arman Roohi, Shaahin Angizi, Deliang Fan, Ronald F DeMara

arXiv_CV

arXiv_CV CNN Inference
Abstract

Herein, a bit-wise Convolutional Neural Network (CNN) in-memory accelerator is implemented using Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) computational sub-arrays. It utilizes a novel AND-Accumulation method capable of significantly-reduced energy consumption within convolutional layers and performs various low bit-width CNN inference operations entirely within MRAM. Power-intermittence resiliency is also enhanced by retaining the partial state information needed to maintain computational forward-progress, which is advantageous for battery-less IoT nodes. Simulation results indicate $\sim$5.4$\times$ higher energy-efficiency and 9$\times$ speedup over ReRAM-based acceleration, or roughly $\sim$9.7$\times$ higher energy-efficiency and 13.5$\times$ speedup over recent CMOS-only approaches, while maintaining inference accuracy comparable to baseline designs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.07864

PDF

https://arxiv.org/pdf/1904.07864
Read All
Energy-Efficient Slithering Gait Exploration for a Snake-like Robot based on Reinforcement Learning

2019-04-16

Zhenshan Bing, Christian Lemke, Zhuangyi Jiang, Kai Huang, Alois Knoll

arXiv_RO

arXiv_RO Reinforcement_Learning Optimization
Abstract

Similar to their counterparts in nature, the flexible bodies of snake-like robots enhance their movement capability and adaptability in diverse environments. However, this flexibility corresponds to a complex control task involving highly redundant degrees of freedom, where traditional model-based methods usually fail to propel the robots energy-efficiently. In this work, we present a novel approach for designing an energy-efficient slithering gait for a snake-like robot using a model-free reinforcement learning (RL) algorithm. Specifically, we present an RL-based controller for generating locomotion gaits at a wide range of velocities, which is trained using the proximal policy optimization (PPO) algorithm. Meanwhile, a traditional parameterized gait controller is presented and the parameter sets are optimized using the grid search and Bayesian optimization algorithms for the purposes of reasonable comparisons. Based on the analysis of the simulation results, we demonstrate that this RL-based controller exhibits very natural and adaptive movements, which are also substantially more energy-efficient than the gaits generated by the parameterized controller. Videos are shown at \textcolor{blue}{\href{https://videoviewsite.wixsite.com/rlsnake}{https://videoviewsite.wixsite.com/rlsnake}}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07788

PDF

http://arxiv.org/pdf/1904.07788
Read All
A Pattern-Hierarchy Classifier for Reduced Teaching

2019-04-16

Kieran Greer

arXiv_AI

arXiv_AI Knowledge GAN Classification Relation
Abstract

This paper uses a branching classifier mechanism in an unsupervised scenario, to enable it to self-organise data into unknown categories. A teaching phase is then able to help the classifier to learn the true category for each input row, using a reduced number of training steps. The pattern ensembles are learned in an unsupervsised manner that use a closest-distance clustering. This is done without knowing what the actual output category is and leads to each actual category having several clusters associated with it. One measure of success is then that each of these sub-clusters is coherent, which means that every data row in the cluster belongs to the same category. The total number of clusters is also important and a teaching phase can then teach the classifier what the correct actual category is. During this phase, any classifier can also learn or infer correct classifications from some other classifier’s knowledge, thereby reducing the required number of presentations. As the information is added, cross-referencing between the two structures allows it to be used more widely. With this process, a unique structure can build up that would not be possible by either method separately. The lower level is a nested ensemble of patterns created by self-organisation. The upper level is a hierarchical tree, where each end node represents a single category only, so there is a transition from mixed ensemble masses to specific categories. The structure also has relations to brain-like modelling.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07786

PDF

http://arxiv.org/pdf/1904.07786
Read All
Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification

2019-04-16

Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

arXiv_AI

arXiv_AI Text_Classification Transfer_Learning Classification
Abstract

Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a common set C of classes, documents each written in one of a set of languages L, and doing so more accurately than when naively classifying each document via its corresponding language-specific classifier. In order to obtain an increase in the classification accuracy for a given language, the system thus needs to also leverage the training examples written in the other languages. We tackle multilabel CLC via funnelling, a new ensemble learning method that we propose here. Funnelling consists of generating a two-tier classification system where all documents, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier all documents are represented in a common, language-independent feature space consisting of the posterior probabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all test documents, of any language, to benefit from the information present in all training documents, of any language. We present substantial experiments, run on publicly available multilingual text collections, in which funnelling is shown to significantly outperform a number of state-of-the-art baselines. All code and datasets (in vector form) are made publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.11459

PDF

http://arxiv.org/pdf/1901.11459
Read All
The ALOS Dataset for Advert Localization in Outdoor Scenes

2019-04-16

Soumyabrata Dev, Murhaf Hossari, Matthew Nicholson, Killian McCabe, Atul Nautiyal, Clare Conran, Jian Tang, Wei Xu, François Pitié

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation
Abstract

The rapid increase in the number of online videos provides the marketing and advertising agents ample opportunities to reach out to their audience. One of the most widely used strategies is product placement, or embedded marketing, wherein new advertisements are integrated seamlessly into existing advertisements in videos. Such strategies involve accurately localizing the position of the advert in the image frame, either manually in the video editing phase, or by using machine learning frameworks. However, these machine learning techniques and deep neural networks need a massive amount of data for training. In this paper, we propose and release the first large-scale dataset of advertisement billboards, captured in outdoor scenes. We also benchmark several state-of-the-art semantic segmentation algorithms on our proposed dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07776

PDF

http://arxiv.org/pdf/1904.07776
Read All
Weakly Supervised Gaussian Networks for Action Detection

2019-04-16

Basura Fernando, Cheston Tan Yin Chet, Hakan Bilen

arXiv_CV

arXiv_CV Object_Detection Weakly_Supervised Action_Recognition Detection Recognition
Abstract

Detecting temporal extents of human actions in videos is a challenging computer vision problem that require detailed manual supervision including frame-level labels. This expensive annotation process limits deploying action detectors on a limited number of categories. We propose a novel action recognition method, called WSGN, that can learn to detect actions from “weak supervision”, video-level labels. WSGN learns to exploit both video-specific and dataset-wide statistics to predict relevance of each frame to an action category. We show that a combination of the local and global channels leads to significant gains in two standard benchmarks THUMOS14 and Charades. Our method improves more than 12% mAP over a weakly supervised baseline, outperforms other weakly supervised state-of-the-art methods and only 4% behind the state-of-the-art supervised method in THUMOS14 dataset for action detection. Similarly, our method is only 0.3% mAP behind a state-of-the-art supervised method on challenging Charades dataset for action localisation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07774

PDF

http://arxiv.org/pdf/1904.07774
Read All
Cryo-Electron Microscopy Image Analysis Using Multi-Frequency Vector Diffusion Maps

2019-04-16

Yifeng Fan, Zhizhen Zhao

arXiv_CV

arXiv_CV Image_Classification Classification
Abstract

Cryo-electron microscopy (EM) single particle reconstruction is an entirely general technique for 3D structure determination of macromolecular complexes. However, because the images are taken at low electron dose, it is extremely hard to visualize the individual particle with low contrast and high noise level. In this paper, we propose a novel approach called multi-frequency vector diffusion maps (MFVDM) to improve the efficiency and accuracy of cryo-EM 2D image classification and denoising. This framework incorporates different irreducible representations of the estimated alignment between similar images. In addition, we propose a graph filtering scheme to denoise the images using the eigenvalues and eigenvectors of the MFVDM matrices. Through both simulated and publicly available real data, we demonstrate that our proposed method is efficient and robust to noise compared with the state-of-the-art cryo-EM 2D class averaging and image restoration algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07772

PDF

http://arxiv.org/pdf/1904.07772
Read All
Persistence Curves: A canonical framework for summarizing persistence diagrams

2019-04-16

Yu-Min Chung, Austin Lawson

arXiv_CV

arXiv_CV Classification
Abstract

Persistence diagrams are a main tool in the field of Topological Data Analysis (TDA). They contain fruitful information about the shape of data. The use of machine learning algorithms on the space of persistence diagrams proves to be challenging as the space is complicated. For that reason, summarizing and vectorizing these diagrams is an important topic currently researched in TDA. In this work, we provide a general framework of summarizing diagrams that we call Persistence Curves (PC). The main idea is so-called Fundamental Lemma of Persistent Homology, which is derived from the classic elder rule. Under this framework, certain well-known summaries, such as persistent Betti numbers, and persistence landscape, are special cases of the PC. Moreover, we prove a rigorous bound for a general families of PCs. In particular, certain family of PCs admit the stability property under an additional assumption. Finally, we apply PCs to textures classification on four well-know texture datasets. The result outperforms several existing TDA methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07768

PDF

http://arxiv.org/pdf/1904.07768
Read All
Co-Separating Sounds of Visual Objects

2019-04-16

Ruohan Gao, Kristen Grauman

arXiv_CV

arXiv_CV
Abstract

Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel. Current methods for visually-guided audio source separation sidestep the issue by training with artificially mixed video clips, but this puts unwieldy restrictions on training data collection and may even prevent learning the properties of “true” mixed sounds. We introduce a co-separation training paradigm that permits learning object-level sounds from unlabeled multi-source videos. Our novel training objective requires that the deep neural network’s separated audio for similar-looking objects be consistently identifiable, while simultaneously reproducing accurate video-level audio tracks for each source training pair. Our approach disentangles sounds in realistic test videos, even in cases where an object was not observed individually during training. We obtain state-of-the-art results on visually-guided audio source separation and audio denoising for the MUSIC, AudioSet, and AV-Bench datasets. Our video results: this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07750

PDF

http://arxiv.org/pdf/1904.07750
Read All
Sameness Attracts, Novelty Disturbs, but Outliers Flourish in Fanfiction Online

2019-04-16

Elise Jing, Simon DeDeo, Yong-Yeol Ahn

arXiv_CL

arXiv_CL Language_Model Recognition
Abstract

The nature of what people enjoy is not just a central question for the creative industry, it is a driving force of cultural evolution. It is widely believed that successful cultural products balance novelty and conventionality: they provide something familiar but at least somewhat divergent from what has come before, and occupy a satisfying middle ground between “more of the same” and “too strange”. We test this belief using a large dataset of over half a million works of fanfiction from the website Archive of Our Own (AO3), looking at how the recognition a work receives varies with its novelty. We quantify the novelty through a term-based language model, and a topic model, in the context of existing works within the same fandom. Contrary to the balance theory, we find that the lowest-novelty are the most popular and that popularity declines monotonically with novelty. A few exceptions can be found: extremely popular works that are among the highest novelty within the fandom. Taken together, our findings not only challenge the traditional theory of the hedonic value of novelty, they invert it: people prefer the least novel things, are repelled by the middle ground, and have an occasional enthusiasm for extreme outliers. It suggests that cultural evolution must work against inertia — the appetite people have to continually reconsume the familiar, and may resemble a punctuated equilibrium rather than a smooth evolution.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07741

PDF

http://arxiv.org/pdf/1904.07741
Read All
Joined Audio-Visual Speech Enhancement and Recognition in the Cocktail Party: The Tug Of War Between Enhancement and Recognition Losses

2019-04-16

Luca Pasa, Giovanni Morrone, Leonardo Badino

arXiv_CL

arXiv_CL Attention Optimization RNN Recognition
Abstract

In this paper we propose an end-to-end LSTM-based model that performs single-channel speech enhancement and phone recognition in a cocktail party scenario where visual information of the target speaker is available. In the speech enhancement phase the proposed system uses a “visual attention” signal of the speaker of interest to extract her speech from the input mixed-speech signal, while in the ASR phase it recognizes her phone sequence through a phone recognizer trained with a CTC loss. It is well known that learning multiple related tasks from data simultaneously can improve performance than learning these tasks independently, therefore we decided to train the model by optimizing both tasks at the same time. This allowed us also to explore whether (and how) this joint optimization leads to better results. We analyzed different training strategies that reveal some interesting and unexpected behaviors. In particular, the experiments demonstrated that during optimization of the ASR phase the speech enhancement capability of the model significantly decreases and vice-versa. We evaluated our approach on mixed-speech versions of GRID and TCD-TIMIT. The obtained results show a remarkable drop of the Phone Error Rate (PER) compared to the audio-visual baseline models trained only to perform phone recognition phase.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08248

PDF

http://arxiv.org/pdf/1904.08248
Read All
Subjective Assessment of Text Complexity: A Dataset for German Language

2019-04-16

Babak Naderi, Salar Mohtaj, Kaspar Ensikat, Sebastian Möller

arXiv_CL

arXiv_CL
Abstract

This paper presents TextComplexityDE, a dataset consisting of 1000 sentences in German language taken from 23 Wikipedia articles in 3 different article-genres to be used for developing text-complexity predictor models and automatic text simplification in German language. The dataset includes subjective assessment of different text-complexity aspects provided by German learners in level A and B. In addition, it contains manual simplification of 250 of those sentences provided by native speakers and subjective assessment of the simplified sentences by participants from the target group. The subjective ratings were collected using both laboratory studies and crowdsourcing approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07733

PDF

http://arxiv.org/pdf/1904.07733
Read All
Semantically Aligned Bias Reducing Zero Shot Learning

2019-04-16

Akanksha Paul, Narayanan C. Krishnan, Prateek Munjal

arXiv_CV

arXiv_CV Face Relation
Abstract

Zero shot learning (ZSL) aims to recognize unseen classes by exploiting semantic relationships between seen and unseen classes. Two major problems faced by ZSL algorithms are the hubness problem and the bias towards the seen classes. Existing ZSL methods focus on only one of these problems in the conventional and generalized ZSL setting. In this work, we propose a novel approach, Semantically Aligned Bias Reducing (SABR) ZSL, which focuses on solving both the problems. It overcomes the hubness problem by learning a latent space that preserves the semantic relationship between the labels while encoding the discriminating information about the classes. Further, we also propose ways to reduce the bias of the seen classes through a simple cross-validation process in the inductive setting and a novel weak transfer constraint in the transductive setting. Extensive experiments on three benchmark datasets suggest that the proposed model significantly outperforms existing state-of-the-art algorithms by ~1.5-9% in the conventional ZSL setting and by ~2-14% in the generalized ZSL for both the inductive and transductive settings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07659

PDF

http://arxiv.org/pdf/1904.07659
Read All
LBVCNN: Local Binary Volume Convolutional Neural Network for Facial Expression Recognition from Image Sequences

2019-04-16

Sudhakar Kumawat, Manisha Verma, Shanmuganathan Raman

arXiv_CV

arXiv_CV CNN Recognition
Abstract

Recognizing facial expressions is one of the central problems in computer vision. Temporal image sequences have useful spatio-temporal features for recognizing expressions. In this paper, we propose a new 3D Convolution Neural Network (CNN) that can be trained end-to-end for facial expression recognition on temporal image sequences without using facial landmarks. More specifically, a novel 3D convolutional layer that we call Local Binary Volume (LBV) layer is proposed. The LBV layer, when used with our newly proposed LBVCNN network, achieve comparable results compared to state-of-the-art landmark-based or without landmark-based models on image sequences from CK+, Oulu-CASIA, and UNBC McMaster shoulder pain datasets. Furthermore, our LBV layer reduces the number of trainable parameters by a significant amount when compared to a conventional 3D convolutional layer. As a matter of fact, when compared to a 3x3x3 conventional 3D convolutional layer, the LBV layer uses 27 times less trainable parameters.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07647

PDF

http://arxiv.org/pdf/1904.07647
Read All
SparseMask: Differentiable Connectivity Learning for Dense Image Prediction

2019-04-16

Huikai Wu, Junge Zhang, Kaiqi Huang

arXiv_CV

arXiv_CV Sparse Segmentation Prediction Gradient_Descent
Abstract

In this paper, we aim at automatically searching an efficient network architecture for dense image prediction. Particularly, we follow the encoder-decoder style and focus on automatically designing a connectivity structure for the decoder. To achieve that, we first design a densely connected network with learnable connections named Fully Dense Network, which contains a large set of possible final connectivity structures. We then employ gradient descent to search the optimal connectivity from the dense connections. The search process is guided by a novel loss function, which pushes the weight of each connection to be binary and the connections to be sparse. The discovered connectivity achieves competitive results on two segmentation datasets, while runs more than three times faster and requires less than half parameters compared to state-of-the-art methods. An extensive experiment shows that the discovered connectivity is compatible with various backbones and generalizes well to other dense image prediction tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07642

PDF

http://arxiv.org/pdf/1904.07642
Read All
Causality Extraction based on Self-Attentive BiLSTM-CRF with Transferred Embeddings

2019-04-16

Zhaoning Li (1), Qi Li (1), Xiaotian Zou (1), Jiangtao Ren (1) ((1) School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, P.R.China)

arXiv_CL

arXiv_CL Knowledge Attention Embedding RNN Relation
Abstract

Causality extraction from natural language texts is a challenging open problem in artificial intelligence. Existing methods utilize patterns, constraints, and machine learning techniques to extract causality, heavily depend on domain knowledge and require considerable human efforts and time on feature engineering. In this paper, we formulate causality extraction as a sequence tagging problem based on a novel causality tagging scheme. On this basis, we propose a neural causality extractor with BiLSTM-CRF model as the backbone, named SCIFI (Self-Attentive BiLSTM-CRF with Flair Embeddings), which can directly extract Cause and Effect, without extracting candidate causal pairs and identifying their relations separately. To tackle the problem of data insufficiency, we transfer the contextual string embeddings, also known as Flair embeddings, which trained on a large corpus into our task. Besides, to improve the performance of causality extraction, we introduce the multi-head self-attention mechanism into SCIFI to learn the dependencies between causal words. We evaluate our method on a public dataset, and experimental results demonstrate that our method achieves significant and consistent improvement as compared to other baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07629

PDF

http://arxiv.org/pdf/1904.07629
Read All
Variational Autoencoders Pursue PCA Directions

2019-04-16

Michal Rolinek, Dominik Zietlow, Georg Martius

arXiv_CV

arXiv_CV Embedding Represenation_Learning
Abstract

The Variational Autoencoder (VAE) is a powerful architecture capable of representation learning and generative modeling. When it comes to learning interpretable (disentangled) representations, VAE and its variants show unparalleled performance. However, the reasons for this are unclear, since a very particular alignment of the latent embedding is needed but the design of the VAE does not encourage it in any explicit way. We address this matter and offer the following explanation: the diagonal approximation in the encoder together with the inherent stochasticity force local orthogonality of the decoder. The local behavior of promoting both reconstruction and orthogonality matches closely how the PCA embedding is chosen. Alongside providing an intuitive understanding, we justify the statement with full theoretical analysis as well as with experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.06775

PDF

http://arxiv.org/pdf/1812.06775
Read All
Learning to Learn Image Classifiers with Visual Analogy

2019-04-16

Linjun Zhou, Peng Cui, Shiqiang Yang, Wenwu Zhu, Qi Tian

arXiv_CV

arXiv_CV GAN Embedding Classification
Abstract

Humans are far better learners who can learn a new concept very fast with only a few samples compared with machines. The plausible mystery making the difference is two fundamental learning mechanisms: learning to learn and learning by analogy. In this paper, we attempt to investigate a new human-like learning method by organically combining these two mechanisms. In particular, we study how to generalize the classification parameters from previously learned concepts to a new concept. we first propose a novel Visual Analogy Graph Embedded Regression (VAGER) model to jointly learn a low-dimensional embedding space and a linear mapping function from the embedding space to classification parameters for base classes. We then propose an out-of-sample embedding method to learn the embedding of a new class represented by a few samples through its visual analogy with base classes and derive the classification parameters for the new class. We conduct extensive experiments on ImageNet dataset and the results show that our method could consistently and significantly outperform state-of-the-art baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1710.06177

PDF

http://arxiv.org/pdf/1710.06177
Read All
Total Denoising: Unsupervised Learning of 3D Point Cloud Cleaning

2019-04-16

Pedro Hermosilla, Tobias Ritschel, Timo Ropinski

arXiv_CV

arXiv_CV
Abstract

We show that denoising of 3D point clouds can be learned unsupervised, directly from noisy 3D point cloud data only. This is achieved by extending recent ideas from learning of unsupervised image denoisers to unstructured 3D point clouds. Unsupervised image denoisers operate under the assumption that a noisy pixel observation is a random realization of a distribution around a clean pixel value, which allows appropriate learning on this distribution to eventually converge to the correct value. Regrettably, this assumption is not valid for unstructured points: 3D point clouds are subject to total noise, i. e., deviations in all coordinates, with no reliable pixel grid. Thus, an observation can be the realization of an entire manifold of clean 3D points, which makes a na"ive extension of unsupervised image denoisers to 3D point clouds impractical. Overcoming this, we introduce a spatial prior term, that steers converges to the unique closest out of the many possible modes on a manifold. Our results demonstrate unsupervised denoising performance similar to that of supervised learning with clean data when given enough training examples - whereby we do not need any pairs of noisy and clean training data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07615

PDF

http://arxiv.org/pdf/1904.07615
Read All
Audio Denoising with Deep Network Priors

2019-04-16

Michael Michelashvili, Lior Wolf

arXiv_SD

arXiv_SD
Abstract

We present a method for audio denoising that combines processing done in both the time domain and the time-frequency domain. Given a noisy audio clip, the method trains a deep neural network to fit this signal. Since the fitting is only partly successful and is able to better capture the underlying clean signal than the noise, the output of the network helps to disentangle the clean audio from the rest of the signal. The method is completely unsupervised and only trains on the specific audio clip that is being denoised. Our experiments demonstrate favorable performance in comparison to the literature methods, and our code and audio samples are available at https: //github.com/mosheman5/DNP. Index Terms: Audio denoising; Unsupervised learning

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07612

PDF

http://arxiv.org/pdf/1904.07612
Read All
Relation-Shape Convolutional Neural Network for Point Cloud Analysis

2019-04-16

Yongcheng Liu, Bin Fan, Shiming Xiang, Chunhong Pan

arXiv_AI

arXiv_AI CNN Relation
Abstract

Point cloud analysis is very challenging, as the shape implied in irregular points is difficult to capture. In this paper, we propose RS-CNN, namely, Relation-Shape Convolutional Neural Network, which extends regular grid CNN to irregular configuration for point cloud analysis. The key to RS-CNN is learning from relation, i.e., the geometric topology constraint among points. Specifically, the convolutional weight for local point set is forced to learn a high-level relation expression from predefined geometric priors, between a sampled point from this point set and the others. In this way, an inductive local representation with explicit reasoning about the spatial layout of points can be obtained, which leads to much shape awareness and robustness. With this convolution as a basic operator, RS-CNN, a hierarchical architecture can be developed to achieve contextual shape-aware learning for point cloud analysis. Extensive experiments on challenging benchmarks across three tasks verify RS-CNN achieves the state of the arts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07601

PDF

http://arxiv.org/pdf/1904.07601
Read All
Detecting the Unexpected via Image Resynthesis

2019-04-16

Krzysztof Lis, Krishna Nakka, Mathieu Salzmann, Pascal Fua

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Deep_Learning Prediction
Abstract

Classical semantic segmentation methods, including the recent deep learning ones, assume that all classes observed at test time have been seen during training. In this paper, we tackle the more realistic scenario where unexpected objects of unknown classes can appear at test time. The main trends in this area either leverage the notion of prediction uncertainty to flag the regions with low confidence as unknown, or rely on autoencoders and highlight poorly-decoded regions. Having observed that, in both cases, the detected regions typically do not correspond to unexpected objects, in this paper, we introduce a drastically different strategy: It relies on the intuition that the network will produce spurious labels in regions depicting unexpected objects. Therefore, resynthesizing the image from the resulting semantic map will yield significant appearance differences with respect to the input image. In other words, we translate the problem of detecting unknown classes to one of identifying poorly-resynthesized image regions. We show that this outperforms both uncertainty- and autoencoder-based methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07595

PDF

http://arxiv.org/pdf/1904.07595
Read All
Patch alignment manifold matting

2019-04-16

Xuelong Li, Kang Liu, Yongsheng Dong, Dacheng Tao

arXiv_CV

arXiv_CV Optimization
Abstract

Image matting is generally modeled as a space transform from the color space to the alpha space. By estimating the alpha factor of the model, the foreground of an image can be extracted. However, there is some dimensional information redundancy in the alpha space. It usually leads to the misjudgments of some pixels near the boundary between the foreground and the background. In this paper, a manifold matting framework named Patch Alignment Manifold Matting is proposed for image matting. In particular, we first propose a part modeling of color space in the local image patch. We then perform whole alignment optimization for approximating the alpha results using subspace reconstructing error. Furthermore, we utilize Nesterov’s algorithm to solve the optimization problem. Finally, we apply some manifold learning methods in the framework, and obtain several image matting methods, such as named ISOMAP matting and its derived Cascade ISOMAP matting. The experimental results reveal that the manifold matting framework and its two examples are effective when compared with several representative matting methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07588

PDF

http://arxiv.org/pdf/1904.07588
Read All
Theoretical Foundations of Defeasible Description Logics

2019-04-16

Katarina Britz, Giovanni Casini, Thomas Meyer, Kody Moodley, Uli Sattler, Ivan Varzinczak

arXiv_AI

arXiv_AI Ontology
Abstract

We extend description logics (DLs) with non-monotonic reasoning features. We start by investigating a notion of defeasible subsumption in the spirit of defeasible conditionals as studied by Kraus, Lehmann and Magidor in the propositional case. In particular, we consider a natural and intuitive semantics for defeasible subsumption, and investigate KLM-style syntactic properties for both preferential and rational subsumption. Our contribution includes two representation results linking our semantic constructions to the set of preferential and rational properties considered. Besides showing that our semantics is appropriate, these results pave the way for more effective decision procedures for defeasible reasoning in DLs. Indeed, we also analyse the problem of non-monotonic reasoning in DLs at the level of entailment and present an algorithm for the computation of rational closure of a defeasible ontology. Importantly, our algorithm relies completely on classical entailment and shows that the computational complexity of reasoning over defeasible ontologies is no worse than that of reasoning in the underlying classical DL ALC.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07559

PDF

http://arxiv.org/pdf/1904.07559
Read All
Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks

2019-04-16

Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper

arXiv_CL

arXiv_CL CNN
Abstract

For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis. Unsupervised discrete subword modelling could be useful for studies of phonetic category learning in infants or in low-resource speech technology requiring symbolic input. We use an autoencoder (AE) architecture with intermediate discretisation. We decouple acoustic unit discovery from speaker modelling by conditioning the AE’s decoder on the training speaker identity. At test time, unit discovery is performed on speech from an unseen speaker, followed by unit decoding conditioned on a known target speaker to obtain reconstructed filterbanks. This output is fed to a neural vocoder to synthesise speech in the target speaker’s voice. For discretisation, categorical variational autoencoders (CatVAEs), vector-quantised VAEs (VQ-VAEs) and straight-through estimation are compared at different compression levels on two languages. Our final model uses convolutional encoding, VQ-VAE discretisation, deconvolutional decoding and an FFTNet vocoder. We show that decoupled speaker conditioning intrinsically improves discrete acoustic representations, yielding competitive synthesis quality compared to the challenge baseline.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07556

PDF

http://arxiv.org/pdf/1904.07556
Read All
Long-Term Video Generation of Multiple FuturesUsing Human Poses

2019-04-16

Naoya Fushishita, Antonio Tejero-de-Pablos, Yusuke Mukuta, Tatsuya Harada

arXiv_CV

arXiv_CV Adversarial CNN Prediction
Abstract

Predicting the near-future from an input video is a useful task for applications such as autonomous driving and robotics. While most previous works predict a single future, multiple futures with different behaviors can possibly occur. Moreover, if the predicted future is too short, it may not be fully usable by a human or other system. In this paper, we propose a novel method for future video prediction capable of generating multiple long-term futures. This makes the predictions more suitable for real applications. First, from an input human video, we generate sequences of future human poses as the image coordinates of their body-joints by adversarial learning. We generate multiple futures by inputting to the generator combinations of a latent code (to reflect various behaviors) and an attraction point (to reflect various trajectories). In addition, we generate long-term future human poses using a novel approach based on unidimensional convolutional neural networks. Last, we generate an output video based on the generated poses for visualization. We evaluate the generated future poses and videos using three criteria (i.e., realism, diversity and accuracy), and show that our proposed method outperforms other state-of-the-art works.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07538

PDF

http://arxiv.org/pdf/1904.07538
Read All
Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

2019-04-16

Martin Simon, Karl Amende, Andrea Kraus, Jens Honer, Timo Sämann, Hauke Kaulbersch, Stefan Milz, Horst Michael Gross

arXiv_CV

arXiv_CV Object_Detection Segmentation Tracking Semantic_Segmentation Inference Detection
Abstract

Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20\% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07537

PDF

http://arxiv.org/pdf/1904.07537
Read All

69/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL