Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Interplanetary Transfers via Deep Representations of the Optimal Policy and/or of the Value Function

2019-04-18

Dario Izzo, Ekin Öztürk, Marcus Märtens

arXiv_AI

arXiv_AI Optimization
Abstract

A number of applications to interplanetary trajectories have been recently proposed based on deep networks. These approaches often rely on the availability of a large number of optimal trajectories to learn from. In this paper we introduce a new method to quickly create millions of optimal spacecraft trajectories from a single nominal trajectory. Apart from the generation of the nominal trajectory, no additional optimal control problems need to be solved as all the trajectories, by construction, satisfy Pontryagin’s minimum principle and the relevant transversality conditions. We then consider deep feed forward neural networks and benchmark three learning methods on the created dataset: policy imitation, value function learning and value function gradient learning. Our results are shown for the case of the interplanetary trajectory optimization problem of reaching Venus orbit, with the nominal trajectory starting from the Earth. We find that both policy imitation and value function gradient learning are able to learn the optimal state feedback, while in the case of value function learning the optimal policy is not captured, only the final value of the optimal propellant mass is.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08809

PDF

http://arxiv.org/pdf/1904.08809
Read All
Influence Maximization via Representation Learning

2019-04-18

George Panagopoulos, Michalis Vazirgiannis, Fragkiskos D. Malliaros

arXiv_AI

arXiv_AI Represenation_Learning Prediction
Abstract

Although influence maximization has been studied extensively in the past, the majority of works focus on the algorithmic aspect of the problem, overlooking several practical improvements that can be derived by data-driven observations or the inclusion of machine learning. The main challenges lie on the one hand on the computational demand of the algorithmic solution which restricts the scalability, and on the other the quality of the predicted influence spread. In this work, we propose IMINFECTOR (Influence Maximization with INFluencer vECTORs), a method that aspires to address both problems using representation learning. It comprises of two parts. The first is based on a multi-task neural network that uses logs of diffusion cascades to embed diffusion probabilities between nodes as well as the ability of a node to create massive cascades. The second part uses diffusion probabilities to reformulate influence maximization as a weighted bipartite matching problem and capitalizes on the learned representations to find a seed set using a greedy heuristic approach. We apply our method in three sizable networks accompanied by diffusion cascades and evaluate it using unseen diffusion cascades from future time steps. We observe that our method outperforms various competitive algorithms and metrics from the diverse landscape of influence maximization, in terms of prediction precision and seed set quality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08804

PDF

http://arxiv.org/pdf/1904.08804
Read All
Learning a Controller Fusion Network by Online Trajectory Filtering for Vision-based UAV Racing

2019-04-18

Matthias Müller, Guohao Li, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

arXiv_CV

arXiv_CV
Abstract

Autonomous UAV racing has recently emerged as an interesting research problem. The dream is to beat humans in this new fast-paced sport. A common approach is to learn an end-to-end policy that directly predicts controls from raw images by imitating an expert. However, such a policy is limited by the expert it imitates and scaling to other environments and vehicle dynamics is difficult. One approach to overcome the drawbacks of an end-to-end policy is to train a network only on the perception task and handle control with a PID or MPC controller. However, a single controller must be extensively tuned and cannot usually cover the whole state space. In this paper, we propose learning an optimized controller using a DNN that fuses multiple controllers. The network learns a robust controller with online trajectory filtering, which suppresses noisy trajectories and imperfections of individual controllers. The result is a network that is able to learn a good fusion of filtered trajectories from different controllers leading to significant improvements in overall performance. We compare our trained network to controllers it has learned from, end-to-end baselines and human pilots in a realistic simulation; our network beats all baselines in extensive experiments and approaches the performance of a professional human pilot. A video summarizing this work is available at https://youtu.be/hGKlE5X9Z5U

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08801

PDF

http://arxiv.org/pdf/1904.08801
Read All
Rumour Detection via News Propagation Dynamics and User Representation Learning

2019-04-18

Tien Huu Do, Xiao Luo, Duc Minh Nguyen, Nikos Deligiannis

arXiv_CL

arXiv_CL Represenation_Learning Deep_Learning Detection Relation
Abstract

Rumours have existed for a long time and have been known for serious consequences. The rapid growth of social media platforms has multiplied the negative impact of rumours; it thus becomes important to early detect them. Many methods have been introduced to detect rumours using the content or the social context of news. However, most existing methods ignore or do not explore effectively the propagation pattern of news in social media, including the sequence of interactions of social media users with news across time. In this work, we propose a novel method for rumour detection based on deep learning. Our method leverages the propagation process of the news by learning the users’ representation and the temporal interrelation of users’ responses. Experiments conducted on Twitter and Weibo datasets demonstrate the state-of-the-art performance of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03042

PDF

http://arxiv.org/pdf/1905.03042
Read All
Strike a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects

2019-04-18

Michael A. Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, Anh Nguyen

arXiv_CV

arXiv_CV Adversarial Object_Detection Detection
Abstract

Despite excellent performance on stationary test sets, deep neural networks (DNNs) can fail to generalize to out-of-distribution (OoD) inputs, including natural, non-adversarial ones, which are common in real-world settings. In this paper, we present a framework for discovering DNN failures that harnesses 3D renderers and 3D models. That is, we estimate the parameters of a 3D renderer that cause a target DNN to misbehave in response to the rendered image. Using our framework and a self-assembled dataset of 3D objects, we investigate the vulnerability of DNNs to OoD poses of well-known objects in ImageNet. For objects that are readily recognized by DNNs in their canonical poses, DNNs incorrectly classify 97% of their pose space. In addition, DNNs are highly sensitive to slight pose perturbations. Importantly, adversarial poses transfer across models and datasets. We find that 99.9% and 99.4% of the poses misclassified by Inception-v3 also transfer to the AlexNet and ResNet-50 image classifiers trained on the same ImageNet dataset, respectively, and 75.5% transfer to the YOLOv3 object detector trained on MS COCO.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.11553

PDF

http://arxiv.org/pdf/1811.11553
Read All
A Data Driven Approach for Motion Planning of Autonomous Driving Under Complex Scenario

2019-04-18

Chenyang Xi, Tianyu Shi, Jie Chen

arXiv_RO

arXiv_RO
Abstract

To guarantee the safe and efficient motion planning of autonomous driving under dynamic traffic environment, the autonomous vehicle should be equipped with not only the optimal but also a long term efficient policy to deal with complex scenarios. The first challenge is that to acquire the optimal planning trajectory means to sacrifice the planning efficiency. The second challenge is that most search based planning method cannot find the desired trajectory in extreme scenario. In this paper, we propose a data driven approach for motion planning to solve the above challenges. We transform the lane change mission into Mixed Integer Quadratic Problem with logical constraints, allowing the planning module to provide feasible, safe and comfortable actions in more complex scenario. Furthermore, we propose a hierarchical learning structure to guarantee online, fast and more generalized motion planning. Our approach’s performance is demonstrated in the simulated lane change scenario and compared with related planning method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08784

PDF

http://arxiv.org/pdf/1904.08784
Read All
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

2019-04-18

Christine Basta, Marta R. Costa-jussà, Noe Casas

arXiv_CL

arXiv_CL Embedding Relation
Abstract

Gender bias is highly impacting natural language processing applications. Word embeddings have clearly been proven both to keep and amplify gender biases that are present in current data sources. Recently, contextualized word embeddings have enhanced previous word embedding techniques by computing word vector representations dependent on the sentence they appear in. In this paper, we study the impact of this conceptual change in the word embedding computation in relation with gender bias. Our analysis includes different measures previously applied in the literature to standard word embeddings. Our findings suggest that contextualized word embeddings are less biased than standard ones even when the latter are debiased.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08783

PDF

http://arxiv.org/pdf/1904.08783
Read All
Causal models for decision making and debugging in cloud computing

2019-04-18

Philipp Geiger, Lucian Carata, Bernhard Schoelkopf

arXiv_AI

arXiv_AI Prediction
Abstract

Cloud computing involves complex technical and economical systems and interactions. This brings about various challenges, two of which are: (1) debugging and control to optimize computing systems with the help of sandbox experiments, and (2) prediction of the cost of spot' resources for decision making of cloud clients. In this paper, we formalize debugging by counterfactual probabilities and control by post-(soft-)interventional probabilities. We prove that counterfactuals can approximately be calculated from a stochastic’ graphical causal model (while they are originally defined only for deterministic' functional causal models), and based on this sketch an approach to address problem (1). To address problem (2), we formalize bidding by post-(soft-)interventional probabilities and present a simple mathematical result on approximate integration of incomplete’ conditional probability distributions. We show how this can be used by cloud clients to trade off privacy against predictability of the outcome of their bidding actions in a toy scenario. We report experiments on simulated and real data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1603.01581

PDF

http://arxiv.org/pdf/1603.01581
Read All
BowTie - A deep learning feedforward neural network for sentiment analysis

2019-04-18

Apostol Vassilev

arXiv_AI

arXiv_AI Sentiment Deep_Learning Prediction
Abstract

How to model and encode the semantics of human-written text and select the type of neural network to process it are not settled issues in sentiment analysis. Accuracy and transferability are critical issues in machine learning in general. These properties are closely related to the loss estimates for the trained model. I present a computationally-efficient and accurate feedforward neural network for sentiment prediction capable of maintaining low losses. When coupled with an effective semantics model of the text, it provides highly accurate models with low losses. Experimental results on representative benchmark datasets and comparisons to other methods show the advantages of the new approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12624

PDF

http://arxiv.org/pdf/1904.12624
Read All
DDNet: Cartesian-polar Dual-domain Network for the Joint Optic Disc and Cup Segmentation

2019-04-18

Qing Liu, Xiaopeng Hong, Wei Ke, Zailiang Chen, Beiji Zou

arXiv_CV

arXiv_CV Segmentation Prediction
Abstract

Existing joint optic disc and cup segmentation approaches are developed either in Cartesian or polar coordinate system. However, due to the subtle optic cup, the contextual information exploited from the single domain even by the prevailing CNNs is still insufficient. In this paper, we propose a novel segmentation approach, named Cartesian-polar dual-domain network (DDNet), which for the first time considers the complementary of the Cartesian domain and the polar domain. We propose a two-branch of domain feature encoder and learn translation equivariant representations on rectilinear grid from Cartesian domain and rotation equivariant representations on polar grid from polar domain parallelly. To fuse the features on two different grids, we propose a dual-domain fusion module. This module builds the correspondence between two grids by the differentiable polar transform layer and learns the feature importance across two domains in element-wise to enhance the expressive capability. Finally, the decoder aggregates the fused features from low-level to high-level and makes dense predictions. We validate the state-of-the-art segmentation performances of our DDNet on the public dataset ORIGA. According to the segmentation masks, we estimate the commonly used clinical measure for glaucoma, i.e., the vertical cup-to-disc ratio. The low cup-to-disc ratio estimation error demonstrates the potential application in glaucoma screening.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08773

PDF

http://arxiv.org/pdf/1904.08773
Read All
Uncovering convolutional neural network decisions for diagnosing multiple sclerosis on conventional MRI using layer-wise relevance propagation

2019-04-18

Fabian Eitel, Emily Soehler, Judith Bellmann-Strobl, Alexander U. Brandt, Klemens Ruprecht, René M. Giess, Joseph Kuchling, Susanna Asseyer, Martin Weygandt, John-Dylan Haynes, Michael Scheel, Friedemann Paul, Kerstin Ritter

arXiv_CV

arXiv_CV Knowledge Tracking CNN Classification Deep_Learning
Abstract

Machine learning-based imaging diagnostics has recently reached or even superseded the level of clinical experts in several clinical domains. However, classification decisions of a trained machine learning system are typically non-transparent, a major hindrance for clinical integration, error tracking or knowledge discovery. In this study, we present a transparent deep learning framework relying on convolutional neural networks (CNNs) and layer-wise relevance propagation (LRP) for diagnosing multiple sclerosis (MS). MS is commonly diagnosed utilizing a combination of clinical presentation and conventional magnetic resonance imaging (MRI), specifically the occurrence and presentation of white matter lesions in T2-weighted images. We hypothesized that using LRP in a naive predictive model would enable us to uncover relevant image features that a trained CNN uses for decision-making. Since imaging markers in MS are well-established this would enable us to validate the respective CNN model. First, we pre-trained a CNN on MRI data from the Alzheimer’s Disease Neuroimaging Initiative (n = 921), afterwards specializing the CNN to discriminate between MS patients and healthy controls (n = 147). Using LRP, we then produced a heatmap for each subject in the holdout set depicting the voxel-wise relevance for a particular classification decision. The resulting CNN model resulted in a balanced accuracy of 87.04% and an area under the curve of 96.08% in a receiver operating characteristic curve. The subsequent LRP visualization revealed that the CNN model focuses indeed on individual lesions, but also incorporates additional information such as lesion location, non-lesional white matter or gray matter areas such as the thalamus, which are established conventional and advanced MRI markers in MS. We conclude that LRP and the proposed framework have the capability to make diagnostic decisions of…

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08771

PDF

http://arxiv.org/pdf/1904.08771
Read All
Particle Filter on Episode

2019-04-18

Ryuichi Ueda, Masahiro Kato, Atsushi Saito

arXiv_RO

arXiv_RO
Abstract

Differently from animals, robots can record its experience correctly for long time. We propose a novel algorithm that runs a particle filter on the time sequence of the experience. It can be applied to some teach-and-replay tasks. In a task, the trainer controls a robot, and the robot records its sensor readings and its actions. We name the sequence of the record an episode, which is derived from the episodic memory of animals. After that, the robot executes the particle filter so as to find a similar situation with the current one from the episode. If the robot chooses the action taken in the similar situation, it can replay the taught behavior. We name this algorithm the particle filter on episode (PFoE). The robot with PFoE shows not only a simple replay of a behavior but also recovery motion from skids and interruption. In this paper, we evaluate the properties of PFoE with a small mobile robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08761

PDF

http://arxiv.org/pdf/1904.08761
Read All
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

2019-04-18

Christopher Choy, JunYoung Gwak, Silvio Savarese

arXiv_AI

arXiv_AI Sparse Segmentation CNN Semantic_Segmentation
Abstract

In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, those 3D-videos are processed frame-by-frame either through 2D convnets or 3D perception algorithms. In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions. For this, we adopt sparse tensors and propose the generalized sparse convolution that encompasses all discrete convolutions. To implement the generalized sparse convolution, we create an open-source auto-differentiation library for sparse tensors that provides extensive functions for high-dimensional convolutional neural networks. We create 4D spatio-temporal convolutional neural networks using the library and validate them on various 3D semantic segmentation benchmarks and proposed 4D datasets for 3D-video perception. To overcome challenges in the 4D space, we propose the hybrid kernel, a special case of the generalized sparse convolution, and the trilateral-stationary conditional random field that enforces spatio-temporal consistency in the 7D space-time-chroma space. Experimentally, we show that convolutional neural networks with only generalized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods by a large margin. Also, we show that on 3D-videos, 4D spatio-temporal convolutional neural networks are robust to noise, outperform 3D convolutional neural networks and are faster than the 3D counterpart in some cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08755

PDF

http://arxiv.org/pdf/1904.08755
Read All
Targetless Rotational Auto-Calibration of Radar and Camera for Intelligent Transportation Systems

2019-04-18

Christoph Schöller, Maximilian Schnettler, Annkathrin Krämmer, Gereon Hinz, Maida Bakovic, Müge Güzet, Alois Knoll

arXiv_CV

arXiv_CV Knowledge CNN
Abstract

Most intelligent transportation systems use a combination of radar sensors and cameras for robust vehicle perception. The calibration of these heterogeneous sensor types in an automatic fashion during system operation is challenging due to differing physical measurement principles and the high sparsity of traffic radars. We propose - to the best of our knowledge - the first data-driven method for automatic rotational radar-camera calibration without dedicated calibration targets. Our approach is based on a coarse and a fine convolutional neural network. We employ a boosting-inspired training algorithm, where we train the fine network on the residual error of the coarse network. Due to the unavailability of public datasets combining radar and camera measurements, we recorded our own real-world data. We demonstrate that our method is able to reach precise and robust sensor registration and show its generalization capabilities to different sensor alignments and perspectives.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08743

PDF

http://arxiv.org/pdf/1904.08743
Read All
Cascaded Partial Decoder for Fast and Accurate Salient Object Detection

2019-04-18

Zhe Wu, Li Su, Qingming Huang

arXiv_CV

arXiv_CV Salient Object_Detection CNN Detection
Abstract

Existing state-of-the-art salient object detection networks rely on aggregating multi-level features of pre-trained convolutional neural networks (CNNs). Compared to high-level features, low-level features contribute less to performance but cost more computations because of their larger spatial resolutions. In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection. On the one hand, the framework constructs partial decoder which discards larger resolution features of shallower layers for acceleration. On the other hand, we observe that integrating features of deeper layers obtain relatively precise saliency map. Therefore we directly utilize generated saliency map to refine the features of backbone network. This strategy efficiently suppresses distractors in the features and significantly improves their representation ability. Experiments conducted on five benchmark datasets exhibit that the proposed model not only achieves state-of-the-art performance but also runs much faster than existing models. Besides, the proposed framework is further applied to improve existing multi-level feature aggregation models and significantly improve their efficiency and accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08739

PDF

http://arxiv.org/pdf/1904.08739
Read All
Multi-scale Microaneurysms Segmentation Using Embedding Triplet Loss

2019-04-18

Mhd Hasan Sarhan, Shadi Albarqouni, Nassir Navab, Abouzar Eslami

arXiv_CV

arXiv_CV Segmentation Embedding CNN Classification Deep_Learning Prediction Quantitative Detection
Abstract

Deep learning techniques are recently being used in fundus image analysis and diabetic retinopathy detection. Microaneurysms are an important indicator of diabetic retinopathy progression. We introduce a two-stage deep learning approach for microaneurysms segmentation using multiple scales of the input with selective sampling and embedding triplet loss. The model first segments on two scales and then the segmentations are refined with a classification model. To enhance the discriminative power of the classification model, we incorporate triplet embedding loss with a selective sampling routine. The model is evaluated quantitatively to assess the segmentation performance and qualitatively to analyze the model predictions. This approach introduces a 30.29% relative improvement over the fully convolutional neural network.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12732

PDF

http://arxiv.org/pdf/1904.12732
Read All
Societal Controversies in Wikipedia Articles

2019-04-18

Erik Borra, Andreas Kaltenbrunner (BMF), Michele Mauri, Esther Weltevrede, David Laniado, Richard Rogers (UvA), Paolo Ciuccarelli, Giovanni Magni, Tommaso Venturini (MEDIALAB, CIS)

arXiv_CL

arXiv_CL
Abstract

Collaborative content creation inevitably reaches situations where different points of view lead to conflict. We focus on Wikipedia, the free encyclopedia anyone may edit, where disputes about content in controversial articles often reflect larger societal debates. While Wikipedia has a public edit history and discussion section for every article, the substance of these sections is difficult to phantom for Wikipedia users interested in the development of an article and in locating which topics were most controversial. In this paper we present Contropedia, a tool that augments Wikipedia articles and gives insight into the development of controversial topics. Contropedia uses an efficient language agnostic measure based on the edit history that focuses on wiki links to easily identify which topics within a Wikipedia article have been most controversial and when.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08721

PDF

http://arxiv.org/pdf/1904.08721
Read All
A Theoretically Sound Upper Bound on the Triplet Loss for Improving the Efficiency of Deep Distance Metric Learning

2019-04-18

Thanh-Toan Do, Toan Tran, Ian Reid, Vijay Kumar, Tuan Hoang, Gustavo Carneiro

arXiv_CV

arXiv_CV Optimization
Abstract

We propose a method that substantially improves the efficiency of deep distance metric learning based on the optimization of the triplet loss function. One epoch of such training process based on a naive optimization of the triplet loss function has a run-time complexity O(N^3), where N is the number of training samples. Such optimization scales poorly, and the most common approach proposed to address this high complexity issue is based on sub-sampling the set of triplets needed for the training process. Another approach explored in the field relies on an ad-hoc linearization (in terms of N) of the triplet loss that introduces class centroids, which must be optimized using the whole training set for each mini-batch - this means that a naive implementation of this approach has run-time complexity O(N^2). This complexity issue is usually mitigated with poor, but computationally cheap, approximate centroid optimization methods. In this paper, we first propose a solid theory on the linearization of the triplet loss with the use of class centroids, where the main conclusion is that our new linear loss represents a tight upper-bound to the triplet loss. Furthermore, based on the theory above, we propose a training algorithm that no longer requires the centroid optimization step, which means that our approach is the first in the field with a guaranteed linear run-time complexity. We show that the training of deep distance metric learning methods using the proposed upper-bound is substantially faster than triplet-based methods, while producing competitive retrieval accuracy results on benchmark datasets (CUB-200-2011 and CAR196).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08720

PDF

http://arxiv.org/pdf/1904.08720
Read All
Knowledge-rich Image Gist Understanding Beyond Literal Meaning

2019-04-18

Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Wolfgang Effelsberg, Laura Dietz

arXiv_CV

arXiv_CV Image_Caption Knowledge Caption Detection
Abstract

We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08709

PDF

http://arxiv.org/pdf/1904.08709
Read All
Diffusion properties of electrons in GaN crystals subjected to electric and magnetic fields

2019-04-18

G.I. Syngayivska, V.V. Korotyeyev, V.A. Kochelap

arXiv_CV

arXiv_CV GAN
Abstract

We studied the diffusion coefficient of hot electrons of GaN crystals in moderate electric (1…10 kV/cm) and magnetic (1…4 T) fields. Two configurations, parallel and crossed fields, are analysed. The study was carried out for compensated bulk-like GaN samples at different lattice temperatures (30…300 K) and impurity concentrations (10^16..10^17 cm^{-3}). We found that at low lattice temperatures and low impurity concentrations, electric-field dependencies of the transverse-to-current components of the diffusion tensor are non-monotonic for both configurations, while the diffusion processes are greatly controlled by the magnetic field. With an increase of the lattice temperature or the impurity concentration, the behaviour of the diffusion tensor becomes more monotonous and less affected by the magnetic field. We showed that such behaviour of the diffusion processes is due to the distinct kinetics of the hot electrons in polar semiconductors with strong electron-optical phonon coupling. We suggest that measurements of the diffusion coefficient of the electrons subjected to electric and magnetic fields facilitate the identification of features of different electron transport regimes and the development of more efficient devices and practical applications.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.08708

PDF

https://arxiv.org/pdf/1904.08708
Read All
Out-of-Distribution Detection for Generalized Zero-Shot Action Recognition

2019-04-18

Devraj Mandal, Sanath Narayan, Saikumar Dwivedi, Vikram Gupta, Shuaib Ahmed, Fahad Shahbaz Khan, Ling Shao

arXiv_CV

arXiv_CV Adversarial Object_Detection Knowledge GAN Action_Recognition Classification Detection Recognition
Abstract

Generalized zero-shot action recognition is a challenging problem, where the task is to recognize new action categories that are unavailable during the training stage, in addition to the seen action categories. Existing approaches suffer from the inherent bias of the learned classifier towards the seen action categories. As a consequence, unseen category samples are incorrectly classified as belonging to one of the seen action categories. In this paper, we set out to tackle this issue by arguing for a separate treatment of seen and unseen action categories in generalized zero-shot action recognition. We introduce an out-of-distribution detector that determines whether the video features belong to a seen or unseen action category. To train our out-of-distribution detector, video features for unseen action categories are synthesized using generative adversarial networks trained on seen action category features. To the best of our knowledge, we are the first to propose an out-of-distribution detector based GZSL framework for action recognition in videos. Experiments are performed on three action recognition datasets: Olympic Sports, HMDB51 and UCF101. For generalized zero-shot action recognition, our proposed approach outperforms the baseline (f-CLSWGAN) with absolute gains (in classification accuracy) of 7.0%, 3.4%, and 4.9%, respectively, on these datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08703

PDF

http://arxiv.org/pdf/1904.08703
Read All
Fusion of Object Tracking and Dynamic Occupancy Grid Map

2019-04-18

Nils Rexin, Marcel Musch, Klaus Dietmayer

arXiv_RO

arXiv_RO Tracking Object_Tracking Quantitative
Abstract

Environment modeling in autonomous driving is realized by two fundamental approaches, grid-based and feature-based approach. Both methods interpret the environment differently and show some situation-dependent beneficial realizations. In order to use the advantages of both methods, a combination makes sense. This work presents a fusion, which establishes an association between the representations of environment modeling and then decoupled from this performs a fusion of the information. Thus, there is no need to adapt the environment models. The developed fusion generates new hypotheses, which are closer to reality than a representation alone. This algorithm itself does not use object model assumptions, in effect this fusion can be applied to different object hypotheses. In addition, this combination allows the objects to be tracked over a longer period of time. This is evaluated with a quantitative evaluation on real sequences in real-time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08701

PDF

http://arxiv.org/pdf/1904.08701
Read All
Convolutional neural networks: a magic bullet for gravitational-wave detection?

2019-04-18

Timothy D. Gebhard, Niki Kilbertus, Ian Harry, Bernhard Schölkopf

arXiv_CV

arXiv_CV Adversarial CNN Classification Detection
Abstract

In the last few years, machine learning techniques, in particular convolutional neural networks, have been investigated as a method to replace or complement traditional matched filtering techniques that are used to detect the gravitational-wave signature of merging black holes. However, to date, these methods have not yet been successfully applied to the analysis of long stretches of data recorded by the Advanced LIGO and Virgo gravitational-wave observatories. In this work, we critically examine the use of convolutional neural networks as a tool to search for merging black holes. We identify the strengths and limitations of this approach, highlight some common pitfalls in translating between machine learning and gravitational-wave astronomy, and discuss the interdisciplinary challenges. In particular, we explain in detail why convolutional neural networks alone can not be used to claim a statistically significant gravitational-wave detection. However, we demonstrate how they can still be used to rapidly flag the times of potential signals in the data for a more detailed follow-up. Our convolutional neural network architecture as well as the proposed performance metrics are better suited for this task than a standard binary classifications scheme. A detailed evaluation of our approach on Advanced LIGO data demonstrates the potential of such systems as trigger generators. Finally, we sound a note of caution by constructing adversarial examples, which showcase interesting “failure modes” of our model, where inputs with no visible resemblance to real gravitational-wave signals are identified as such by the network with high confidence.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.08693

PDF

https://arxiv.org/pdf/1904.08693
Read All
Examining the Capability of GANs to Replace Real Biomedical Images in Classification Models Training

2019-04-18

Vassili Kovalev, Siarhei Kazlouski

arXiv_CV

arXiv_CV GAN Classification Deep_Learning Prediction
Abstract

In this paper, we explore the possibility of generating artificial biomedical images that can be used as a substitute for real image datasets in applied machine learning tasks. We are focusing on generation of realistic chest X-ray images as well as on the lymph node histology images using the two recent GAN architectures including DCGAN and PGGAN. The possibility of the use of artificial images instead of real ones for training machine learning models was examined by benchmark classification tasks being solved using conventional and deep learning methods. In particular, a comparison was made by replacing real images with synthetic ones at the model training stage and comparing the prediction results with the ones obtained while training on the real image data. It was found that the drop of classification accuracy caused by such training data substitution ranged between 2.2% and 3.5% for deep learning models and between 5.5% and 13.25% for conventional methods such as LBP + Random Forests.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08688

PDF

http://arxiv.org/pdf/1904.08688
Read All
Global Hashing System for Fast Image Search

2019-04-18

Dayong Tian, Dacheng Tao

arXiv_CV

arXiv_CV Embedding
Abstract

Hashing methods have been widely investigated for fast approximate nearest neighbor searching in large data sets. Most existing methods use binary vectors in lower dimensional spaces to represent data points that are usually real vectors of higher dimensionality. We divide the hashing process into two steps. Data points are first embedded in a low-dimensional space, and the global positioning system method is subsequently introduced but modified for binary embedding. We devise dataindependent and data-dependent methods to distribute the satellites at appropriate locations. Our methods are based on finding the tradeoff between the information losses in these two steps. Experiments show that our data-dependent method outperforms other methods in different-sized data sets from 100k to 10M. By incorporating the orthogonality of the code matrix, both our data-independent and data-dependent methods are particularly impressive in experiments on longer bits.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08685

PDF

http://arxiv.org/pdf/1904.08685
Read All
Surface states in AlGaN/GaN high electron mobility transistors: Quantitative profiles and dynamics of the surface Fermi level

2019-04-18

Yury Turkulets, Ilan Shalish

arXiv_CV

arXiv_CV GAN Face Quantitative
Abstract

We present a method to obtain quantitative profiles of surface state charge density and monitor its dynamics under various stress conditions in high electron mobility transistor (HEMT) devices. The method employs an optical spectroscopy of the channel current at various bias conditions. We test the method on a classical AlGaN/GaN HEMT structure. To analyze the results, we propose a model, according to which the energy distribution of the surface charge density may be obtained from the derivative of the channel photocurrent. The proposed method is applied to fully fabricated transistors and can be measured under any device bias combination. This way, it is possible to explore the effect of device operating conditions on the surface state charge. This feature should be especially useful in studies of the various surface charge migration effects in nitride HEMTs. An important byproduct of the method is a quantitative assessment of the energy position of the surface Fermi level and its dynamics under various bias conditions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.08683

PDF

https://arxiv.org/pdf/1904.08683
Read All
THz frequency- and wavevector-dependent conductivity of low-density drifting electron gas in GaN. Monte Carlo calculations

2019-04-18

G. I. Syngayivska, V. V. Korotyeyev, V. A. Kochelap, L. Varani

arXiv_CV

arXiv_CV GAN
Abstract

We report the results of Monte Carlo simulation of electron dynamics in stationary and space- and time-dependent electric fields in compensated GaN samples. We have determined the frequency and wavevector dependencies of the dynamic conductivity, $\sigma_{\omega,q}$. We have found that the spatially dependent dynamic conductivity of the drifting electrons can be negative under stationary electric fields of moderate amplitudes, $2..5$ kV/cm. This effect is realized in a set of frequency windows. The low-frequency window with negative dynamic conductivity is due to the Cherenkov mechanism. For this case the time-dependent field induces a {\it traveling wave} of the electron concentration in real space and a {\it standing wave} in the energy/momentum space. The higher frequency windows of negative dynamic conductivity are associated with the optical phonon transient time resonances. For this case the time-dependent field is accompanied by oscillations of the electron distribution in the form of the {\it traveling} waves in both the real space and the energy/momentum space. We discuss the optimal conditions for the observation of these effects. We suggest that the studied negative dynamic conductivity can be used to amplify electromagnetic waves at the expense of energy of the stationary field and current.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.08681

PDF

https://arxiv.org/pdf/1904.08681
Read All
Coupled Learning for Facial Deblur

2019-04-18

Dayong Tian, Dacheng Tao

arXiv_CV

arXiv_CV Face Recognition Face_Recognition
Abstract

Blur in facial images significantly impedes the efficiency of recognition approaches. However, most existing blind deconvolution methods cannot generate satisfactory results due to their dependence on strong edges, which are sufficient in natural images but not in facial images. In this paper, we represent point spread functions (PSFs) by the linear combination of a set of pre-defined orthogonal PSFs, and similarly, an estimated intrinsic (EI) sharp face image is represented by the linear combination of a set of pre-defined orthogonal face images. In doing so, PSF and EI estimation is simplified to discovering two sets of linear combination coefficients, which are simultaneously found by our proposed coupled learning algorithm. To make our method robust to different types of blurry face images, we generate several candidate PSFs and EIs for a test image, and then, a non-blind deconvolution method is adopted to generate more EIs by those candidate PSFs. Finally, we deploy a blind image quality assessment metric to automatically select the optimal EI. Thorough experiments on the facial recognition technology database, extended Yale face database B, CMU pose, illumination, and expression (PIE) database, and face recognition grand challenge database version 2.0 demonstrate that the proposed approach effectively restores intrinsic sharp face images and, consequently, improves the performance of face recognition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08671

PDF

http://arxiv.org/pdf/1904.08671
Read All
An Efficient Approximate kNN Graph Method for Diffusion on Image Retrieval

2019-04-18

Federico Magliani, Kevin McGuinness, Eva Mohedano, Andrea Prati

arXiv_CV

arXiv_CV Image_Retrieval
Abstract

The application of the diffusion in many computer vision and artificial intelligence projects has been shown to give excellent improvements in performance. One of the main bottlenecks of this technique is the quadratic growth of the kNN graph size due to the high-quantity of new connections between nodes in the graph, resulting in long computation times. Several strategies have been proposed to address this, but none are effective and efficient. Our novel technique, based on LSH projections, obtains the same performance as the exact kNN graph after diffusion, but in less time (approximately 18 times faster on a dataset of a hundred thousand images). The proposed method was validated and compared with other state-of-the-art on several public image datasets, including Oxford5k, Paris6k, and Oxford105k.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08668

PDF

http://arxiv.org/pdf/1904.08668
Read All
Knowledge Aware Conversation Generation with Explainable Reasoning on Augmented Graphs

2019-04-18

Zhibin Liu, Zheng-Yu Niu, Hua Wu, Haifeng Wang

arXiv_AI

arXiv_AI Knowledge_Graph Knowledge
Abstract

Two types of knowledge, triples from knowledge graphs and texts from unstructured documents, have been studied for knowledge aware open-domain conversation generation, in which triple attributes or graph paths can narrow down vertex candidates for knowledge selection decision, and texts can provide rich information for response generation. Fusion of a knowledge graph and texts might yield mutually reinforcing advantages for conversation generation, but there is less study on that. To address this challenge, we propose a knowledge aware chatting machine with three components, an augmented knowledge graph containing both triples and texts, knowledge selector, and response generator. For knowledge selection on the graph, we formulate it as a problem of multi-hop graph reasoning that is more explainable and flexible in comparison with previous works. To fully leverage long text information that differentiates our graph from others, we improve a state of the art reasoning algorithm with machine reading comprehension technology. We demonstrate that supported by such unified knowledge and explainable knowledge selection method, our system can generate more appropriate and informative responses than baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.10245

PDF

http://arxiv.org/pdf/1903.10245
Read All
Fully Automatic Segmentation of 3D Brain Ultrasound: Learning from Coarse Annotations

2019-04-18

Julia Rackerseder, Rüdiger Göbl, Nassir Navab, Christoph Hennersperger

arXiv_CV

arXiv_CV Segmentation
Abstract

Intra-operative ultrasound is an increasingly important imaging modality in neurosurgery. However, manual interaction with imaging data during the procedures, for example to select landmarks or perform segmentation, is difficult and can be time consuming. Yet, as registration to other imaging modalities is required in most cases, some annotation is necessary. We propose a segmentation method based on DeepVNet and specifically evaluate the integration of pre-training with simulated ultrasound sweeps to improve automatic segmentation and enable a fully automatic initialization of registration. In this view, we show that despite training on coarse and incomplete semi-automatic annotations, our approach is able to capture the desired superficial structures such as \textit{sulci}, the \textit{cerebellar tentorium}, and the \textit{falx cerebri}. We perform a five-fold cross-validation on the publicly available RESECT dataset. Trained on the dataset alone, we report a Dice and Jaccard coefficient of $0.45 \pm 0.09$ and $0.30 \pm 0.07$ respectively, as well as an average distance of $0.78 \pm 0.36~mm$. With the suggested pre-training, we computed a Dice and Jaccard coefficient of $0.47 \pm 0.10$ and $0.31 \pm 0.08$, and an average distance of $0.71 \pm 0.38~mm$. The qualitative evaluation suggest that with pre-training the network can learn to generalize better and provide refined and more complete segmentations in comparison to incomplete annotations provided as input.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08655

PDF

http://arxiv.org/pdf/1904.08655
Read All
CenterNet: Keypoint Triplets for Object Detection

2019-04-18

Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

arXiv_CV

arXiv_CV Object_Detection Inference Detection
Abstract

In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions. This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules named cascade corner pooling and center pooling, which play the roles of enriching information collected by both top-left and bottom-right corners and providing more recognizable information at the central regions, respectively. On the MS-COCO dataset, CenterNet achieves an AP of 47.0%, which outperforms all existing one-stage detectors by at least 4.9%. Meanwhile, with a faster inference speed, CenterNet demonstrates quite comparable performance to the top-ranked two-stage detectors. Code is available at https://github.com/Duankaiwen/CenterNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08189

PDF

http://arxiv.org/pdf/1904.08189
Read All
Analytical Methods for Interpretable Ultradense Word Embeddings

2019-04-18

Philipp Dufter, Hinrich Schütze

arXiv_CL

arXiv_CL Embedding
Abstract

Word embeddings are useful for a wide variety of tasks, but they lack interpretability. By rotating word spaces, interpretable dimensions can be identified while preserving the information contained in the embeddings without any loss. In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we propose. While DensRay is very closely related to the Densifier, it can be computed in closed form, is hyperparameter-free and thus more robust than the Densifier. We evaluate the methods on lexicon induction and set-based word analogy and conclude that analytical methods such as DensRay and SVMs are preferable. For word analogy we propose a new method to solve the task which outperforms the previous state of the art by large margins.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08654

PDF

http://arxiv.org/pdf/1904.08654
Read All
Fooling automated surveillance cameras: adversarial patches to attack person detection

2019-04-18

Simen Thys, Wiebe Van Ranst, Toon Goedemé

arXiv_CV

arXiv_CV Adversarial Object_Detection Knowledge CNN Detection
Abstract

Adversarial attacks on machine learning models have seen increasing interest in the past years. By making only subtle changes to the input of a convolutional neural network, the output of the network can be swayed to output a completely different result. The first attacks did this by changing pixel values of an input image slightly to fool a classifier to output the wrong class. Other approaches have tried to learn “patches” that can be applied to an object to fool detectors and classifiers. Some of these approaches have also shown that these attacks are feasible in the real-world, i.e. by modifying an object and filming it with a video camera. However, all of these approaches target classes that contain almost no intra-class variety (e.g. stop signs). The known structure of the object is then used to generate an adversarial patch on top of it. In this paper, we present an approach to generate adversarial patches to targets with lots of intra-class variety, namely persons. The goal is to generate a patch that is able successfully hide a person from a person detector. An attack that could for instance be used maliciously to circumvent surveillance systems, intruders can sneak around undetected by holding a small cardboard plate in front of their body aimed towards the surveillance camera. From our results we can see that our system is able significantly lower the accuracy of a person detector. Our approach also functions well in real-life scenarios where the patch is filmed by a camera. To the best of our knowledge we are the first to attempt this kind of attack on targets with a high level of intra-class variety like persons.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08653

PDF

http://arxiv.org/pdf/1904.08653
Read All
Language Modeling through Long Term Memory Network

2019-04-18

Anupiya Nugaliyadde, Kok Wai Wong, Ferdous Sohel, Hong Xie

arXiv_CL

arXiv_CL RNN Language_Model Relation Memory_Networks
Abstract

Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can handle long sequences but suffers from the vanishing and exploding gradient problems. While LSTM and other memory networks address this problem, they are not capable of handling long sequences (50 or more data points long sequence patterns). Language modelling requiring learning from longer sequences are affected by the need for more information in memory. This paper introduces Long Term Memory network (LTM), which can tackle the exploding and vanishing gradient problems and handles long sequences without forgetting. LTM is designed to scale data in the memory and gives a higher weight to the input in the sequence. LTM avoid overfitting by scaling the cell state after achieving the optimal results. The LTM is tested on Penn treebank dataset, and Text8 dataset and LTM achieves test perplexities of 83 and 82 respectively. 650 LTM cells achieved a test perplexity of 67 for Penn treebank, and 600 cells achieved a test perplexity of 77 for Text8. LTM achieves state of the art results by only using ten hidden LTM cells for both datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08936

PDF

http://arxiv.org/pdf/1904.08936
Read All
Tex2Shape: Detailed Full Human Body Geometry from a Single Image

2019-04-18

Thiemo Alldieck, Gerard Pons-Moll, Christian Theobalt, Marcus Magnor

arXiv_CV

arXiv_CV Face
Abstract

We present a simple yet effective method to infer detailed full human body shape from only a single photograph. Our model can infer full-body shape including face, hair, and clothing including wrinkles at interactive frame-rates. Results feature details even on parts that are occluded in the input image. Our main idea is to turn shape regression into an aligned image-to-image translation problem. The input to our method is a partial texture map of the visible region obtained from off-the-shelf methods. From a partial texture, we estimate detailed normal and vector displacement maps, which can be applied to a low-resolution smooth body model to add detail and clothing. Despite being trained purely with synthetic data, our model generalizes well to real-world photographs. Numerous results demonstrate the versatility and robustness of our method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08645

PDF

http://arxiv.org/pdf/1904.08645
Read All
Real-Time Style Transfer With Strength Control

2019-04-18

Victor Kitov

arXiv_CV

arXiv_CV Style_Transfer Inference Quantitative
Abstract

Style transfer is a problem of rendering a content image in the style of another style image. A natural and common practical task in applications of style transfer is to adjust the strength of stylization. Algorithm of Gatys et al. (2016) provides this ability by changing the weighting factors of content and style losses but is computationally inefficient. Real-time style transfer introduced by Johnson et al. (2016) enables fast stylization of any image by passing it through a pre-trained transformer network. Although fast, this architecture is not able to continuously adjust style strength. We propose an extension to real-time style transfer that allows direct control of style strength at inference, still requiring only a single transformer network. We conduct qualitative and quantitative experiments that demonstrate that the proposed method is capable of smooth stylization strength control and removes certain stylization artifacts appearing in the original real-time style transfer method. Comparisons with alternative real-time style transfer algorithms, capable of adjusting stylization strength, show that our method reproduces style with more details.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08643

PDF

http://arxiv.org/pdf/1904.08643
Read All
ConvLab: Multi-Domain End-to-End Dialog System Platform

2019-04-18

Sungjin Lee, Qi Zhu, Ryuichi Takanobu, Xiang Li, Yaoqin Zhang, Zheng Zhang, Jinchao Li, Baolin Peng, Xiujun Li, Minlie Huang, Jianfeng Gao

arXiv_AI

arXiv_AI
Abstract

We present ConvLab, an open-source multi-domain end-to-end dialog system platform, that enables researchers to quickly set up experiments with reusable components and compare a large set of different approaches, ranging from conventional pipeline systems to end-to-end neural models, in common environments. ConvLab offers a set of fully annotated datasets and associated pre-trained reference models. As a showcase, we extend the MultiWOZ dataset with user dialog act annotations to train all component models and demonstrate how ConvLab makes it easy and effortless to conduct complicated experiments in multi-domain end-to-end dialog settings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08637

PDF

http://arxiv.org/pdf/1904.08637
Read All
DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition

2019-04-18

Toby Perrett, Dima Damen

arXiv_CV

arXiv_CV Action_Recognition CNN RNN Recognition
Abstract

Domain alignment in convolutional networks aims to learn the degree of layer-specific feature alignment beneficial to the joint learning of source and target datasets. While increasingly popular in convolutional networks, there have been no previous attempts to achieve domain alignment in recurrent networks. Similar to spatial features, both source and target domains are likely to exhibit temporal dependencies that can be jointly learnt and aligned. In this paper we introduce Dual-Domain LSTM (DDLSTM), an architecture that is able to learn temporal dependencies from two domains concurrently. It performs cross-contaminated batch normalisation on both input-to-hidden and hidden-to-hidden weights, and learns the parameters for cross-contamination, for both single-layer and multi-layer LSTM architectures. We evaluate DDLSTM on frame-level action recognition using three datasets, taking a pair at a time, and report an average increase in accuracy of 3.5%. The proposed DDLSTM architecture outperforms standard, fine-tuned, and batch-normalised LSTMs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08634

PDF

http://arxiv.org/pdf/1904.08634
Read All
Crowd Management in Open Spaces

2019-04-18

Tauseef Ali, Ahmed B. Altamimi

arXiv_CV

arXiv_CV
Abstract

Crowd analysis and management is a challenging problem to ensure public safety and security. For this purpose, many techniques have been proposed to cope with various problems. However, the generalization capabilities of these techniques is limited due to ignoring the fact that the density of crowd changes from low to extreme high depending on the scene under observation. We propose robust feature based approach to deal with the problem of crowd management for people safety and security. We have evaluated our method using a benchmark dataset and have presented details analysis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12625

PDF

http://arxiv.org/pdf/1904.12625
Read All
Learning a No-Reference Quality Assessment Model of Enhanced Images With Big Data

2019-04-18

Ke Gu, Dacheng Tao, Junfei Qiao, Weisi Lin

arXiv_CV

arXiv_CV Object_Detection QA Attention Image_Enhancement Optimization Detection Recognition
Abstract

In this paper we investigate into the problem of image quality assessment (IQA) and enhancement via machine learning. This issue has long attracted a wide range of attention in computational intelligence and image processing communities, since, for many practical applications, e.g. object detection and recognition, raw images are usually needed to be appropriately enhanced to raise the visual quality (e.g. visibility and contrast). In fact, proper enhancement can noticeably improve the quality of input images, even better than originally captured images which are generally thought to be of the best quality. In this work, we present two most important contributions. The first contribution is to develop a new no-reference (NR) IQA model. Given an image, our quality measure first extracts 17 features through analysis of contrast, sharpness, brightness and more, and then yields a measre of visual quality using a regression module, which is learned with big-data training samples that are much bigger than the size of relevant image datasets. Results of experiments on nine datasets validate the superiority and efficiency of our blind metric compared with typical state-of-the-art full-, reduced- and no-reference IQA methods. The second contribution is that a robust image enhancement framework is established based on quality optimization. For an input image, by the guidance of the proposed NR-IQA measure, we conduct histogram modification to successively rectify image brightness and contrast to a proper level. Thorough tests demonstrate that our framework can well enhance natural images, low-contrast images, low-light images and dehazed images. The source code will be released at https://sites.google.com/site/guke198701/publications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08632

PDF

http://arxiv.org/pdf/1904.08632
Read All
Unsupervised Open Domain Recognition by Semantic Discrepancy Minimization

2019-04-18

Junbao Zhuo, Shuhui Wang, Shuhao Cui, Qingming Huang

arXiv_CV

arXiv_CV Embedding Classification Recognition
Abstract

We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain S is only a subset of those in unlabeled target domain T. The task is to correctly classify all samples in T including known and unknown categories. UODR is challenging due to the domain discrepancy, which becomes even harder to bridge when a large number of unknown categories exist in T. Moreover, the classification rules propagated by graph CNN (GCN) may be distracted by unknown categories and lack generalization capability. To measure the domain discrepancy for asymmetric label space between S and T, we propose Semantic-Guided Matching Discrepancy (SGMD), which first employs instance matching between S and T, and then the discrepancy is measured by a weighted feature distance between matched instances. We further design a limited balance constraint to achieve a more balanced classification output on known and unknown categories. We develop Unsupervised Open Domain Transfer Network (UODTN), which learns both the backbone classification network and GCN jointly by reducing the SGMD, enforcing the limited balance constraint and minimizing the classification loss on S. UODTN better preserves the semantic structure and enforces the consistency between the learned domain invariant visual features and the semantic embeddings. Experimental results show superiority of our method on recognizing images of both known and unknown categories.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08631

PDF

http://arxiv.org/pdf/1904.08631
Read All
Discriminative Online Learning for Fast Video Object Segmentation

2019-04-18

Andreas Robinson, Felix Järemo Lawin, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg

arXiv_CV

arXiv_CV Segmentation Optimization Prediction
Abstract

We address the highly challenging problem of video object segmentation. Given only the initial mask, the task is to segment the target in the subsequent frames. In order to effectively handle appearance changes and similar background objects, a robust representation of the target is required. Previous approaches either rely on fine-tuning a segmentation network on the first frame, or employ generative appearance models. Although partially successful, these methods often suffer from impractically low frame rates or unsatisfactory robustness. We propose a novel approach, based on a dedicated target appearance model that is exclusively learned online to discriminate between the target and background image regions. Importantly, we design a specialized loss and customized optimization techniques to enable highly efficient online training. Our light-weight target model is integrated into a carefully designed segmentation network, trained offline to enhance the predictions generated by the target model. Extensive experiments are performed on three datasets. Our approach achieves an overall score of over 70 on YouTube-VOS, while operating at 25 frames per second.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08630

PDF

http://arxiv.org/pdf/1904.08630
Read All
Ontology-based Design of Experiments on Big Data Solutions

2019-04-18

Maximilian Zocholl, Elena Camossi, Anne-Laure Jousselme, Cyril Ray

arXiv_AI

arXiv_AI Knowledge Ontology
Abstract

Big data solutions are designed to cope with data of huge Volume and wide Variety, that need to be ingested at high Velocity and have potential Veracity issues, challenging characteristics that are usually referred to as the “4Vs of Big Data”. In order to evaluate possibly complex big data solutions, stress tests require to assess a large number of combinations of sub-components jointly with the possible big data variations. A formalization of the Design of Experiments (DoE) on big data solutions is aimed at ensuring the reproducibility of the experiments, facilitating their partitioning in sub-experiments and guaranteeing the consistency of their outcomes in a global assessment. In this paper, an ontology-based approach is proposed to support the evaluation of a big data system in two ways. Firstly, the approach formalizes a decomposition and recombination of the big data solution, allowing for the aggregation of component evaluation results at inter-component level. Secondly, existing work on DoE is translated into an ontology for supporting the selection of experiments. The proposed ontology-based approach offers the possibility to combine knowledge from the evaluation domain and the application domain. It exploits domain and inter-domain specific restrictions on the factor combinations in order to reduce the number of experiments. Contrary to existing approaches, the proposed use of ontologies is not limited to the assertional description and exploitation of past experiments but offers richer terminological descriptions for the development of a DoE from scratch. As an application example, a maritime big data solution to the problem of detecting and predicting vessel suspicious behaviour through mobility analysis is selected. The article is concluded with a sketch of future works.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08626

PDF

http://arxiv.org/pdf/1904.08626
Read All
Improving Interactive Reinforcement Agent Planning with Human Demonstration

2019-04-18

Guangliang Li, Randy Gomez, Keisuke Nakamura, Jinying Lin, Qilei Zhang, Bo He

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

TAMER has proven to be a powerful interactive reinforcement learning method for allowing ordinary people to teach and personalize autonomous agents’ behavior by providing evaluative feedback. However, a TAMER agent planning with UCT—a Monte Carlo Tree Search strategy, can only update states along its path and might induce high learning cost especially for a physical robot. In this paper, we propose to drive the agent’s exploration along the optimal path and reduce the learning cost by initializing the agent’s reward function via inverse reinforcement learning from demonstration. We test our proposed method in the RL benchmark domain—Grid World—with different discounts on human reward. Our results show that learning from demonstration can allow a TAMER agent to learn a roughly optimal policy up to the deepest search and encourage the agent to explore along the optimal path. In addition, we find that learning from demonstration can improve the learning efficiency by reducing total feedback, the number of incorrect actions and increasing the ratio of correct actions to obtain an optimal policy, allowing a TAMER agent to converge faster.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08621

PDF

http://arxiv.org/pdf/1904.08621
Read All
Disentangled Representation Learning with Information Maximizing Autoencoder

2019-04-18

Kazi Nazmul Haque, Siddique Latif, Rajib Rana

arXiv_CV

arXiv_CV Represenation_Learning
Abstract

Learning disentangled representation from any unlabelled data is a non-trivial problem. In this paper we propose Information Maximising Autoencoder (InfoAE) where the encoder learns powerful disentangled representation through maximizing the mutual information between the representation and given information in an unsupervised fashion. We have evaluated our model on MNIST dataset and achieved 98.9 ($\pm .1$) $\%$ test accuracy while using complete unsupervised training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08613

PDF

http://arxiv.org/pdf/1904.08613
Read All
Client/Server Based Online Environment for Manual Segmentation of Medical Images

2019-04-18

Daniel Wild, Maximilian Weber, Jan Egger

arXiv_CV

arXiv_CV Segmentation
Abstract

Segmentation is a key step in analyzing and processing medical images. Due to the low fault tolerance in medical imaging, manual segmentation remains the de facto standard in this domain. Besides, efforts to automate the segmentation process often rely on large amounts of manually labeled data. While existing software supporting manual segmentation is rich in features and delivers accurate results, the necessary time to set it up and get comfortable using it can pose a hurdle for the collection of large datasets. This work introduces a client/server based online environment, referred to as Studierfenster (studierfenster.at), that can be used to perform manual segmentations directly in a web browser. The aim of providing this functionality in the form of a web application is to ease the collection of ground truth segmentation datasets. Providing a tool that is quickly accessible and usable on a broad range of devices, offers the potential to accelerate this process. The manual segmentation workflow of Studierfenster consists of dragging and dropping the input file into the browser window and slice-by-slice outlining the object under consideration. The final segmentation can then be exported as a file storing its contours and as a binary segmentation mask. In order to evaluate the usability of Studierfenster, a user study was performed. The user study resulted in a mean of 6.3 out of 7.0 possible points given by users, when asked about their overall impression of the tool. The evaluation also provides insights into the results achievable with the tool in practice, by presenting two ground truth segmentations performed by physicians.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08610

PDF

http://arxiv.org/pdf/1904.08610
Read All
Explaining Deep Classification of Time-Series Data with Learned Prototypes

2019-04-18

Alan H. Gee, Diego Garcia-Olano, Joydeep Ghosh, David Paydarfar

arXiv_AI

arXiv_AI Classification Deep_Learning Recommendation
Abstract

The emergence of deep learning networks raises a need for algorithms to explain their decisions so that users and domain experts can be confident using algorithmic recommendations for high-risk decisions. In this paper we leverage the information-rich latent space induced by such models to learn data representations or prototypes within such networks to elucidate their internal decision-making process. We introduce a novel application of case-based reasoning using prototypes to understand the decisions leading to the classification of time-series data, specifically investigating electrocardiogram (ECG) waveforms for classification of bradycardia, a slowing of heart rate, in infants. We improve upon existing models by explicitly optimizing for increased prototype diversity which in turn improves model accuracy by learning regions of the latent space that highlight features for distinguishing classes. We evaluate the hyperparameter space of our model to show robustness in diversity prototype generation and additionally, explore the resultant latent space of a deep classification network on ECG waveforms via an interactive tool to visualize the learned prototypical waveforms therein. We show that the prototypes are capable of learning real-world features - in our case-study ECG morphology related to bradycardia - as well as features within sub-classes. Our novel work leverages learned prototypical framework on two dimensional time-series data to produce explainable insights during classification tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08935

PDF

http://arxiv.org/pdf/1904.08935
Read All
Learning to Collocate Neural Modules for Image Captioning

2019-04-18

Xu Yang, Hanwang Zhang, Jianfei Cai

arXiv_CV

arXiv_CV Image_Caption Caption
Abstract

We do not speak word by word from scratch; our brain quickly structures a pattern like \textsc{sth do sth at someplace} and then fill in the detailed descriptions. To render existing encoder-decoder image captioners such human-like reasoning, we propose a novel framework: learning to Collocate Neural Modules (CNM), to generate the `inner pattern’ connecting visual encoder and language decoder. Unlike the widely-used neural module networks in visual Q\&A, where the language (ie, question) is fully observable, CNM for captioning is more challenging as the language is being generated and thus is partially observable. To this end, we make the following technical contributions for CNM training: 1) compact module design — one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun). Extensive experiments on the challenging MS-COCO image captioning benchmark validate the effectiveness of our CNM image captioner. In particular, CNM achieves a new state-of-the-art 127.9 CIDEr-D on Karpathy split and a single-model 126.0 c40 on the official server. CNM is also robust to few training samples, eg, by training only one sentence per image, CNM can halve the performance loss compared to a strong baseline.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08608

PDF

http://arxiv.org/pdf/1904.08608
Read All
Attending Category Disentangled Global Context for Image Classification

2019-04-18

Keke Tang, Guodong Wei, Runnan Chen, Jie Zhu, Zhaoquan Gu, Wenping Wang

arXiv_CV

arXiv_CV Attention Image_Classification Classification
Abstract

In this paper, we propose a general framework for image classification using the attention mechanism and global context, which could incorporate with various network architectures to improve their performance. To investigate the capability of the global context, we compare four mathematical models and observe the global context encoded in the category disentangled conditional generative model retains the richest complementary information to that in the baseline classification networks. Based on this observation, we define a novel Category Disentangled Global Context (CDGC) and devise a deep network to obtain it. By attending CDGC, the baseline networks could identify the objects of interest more accurately, thus improving the performance. We apply the framework to many different network architectures to demonstrate its effectiveness and versatility. Extensive results on five publicly available datasets validate our approach could generalize well and is superior to the state-of-the-art. In addition, the framework could be combined with various self-attention based methods to further promote the performance. Code and pretrained models will be made public upon paper acceptance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.06663

PDF

http://arxiv.org/pdf/1812.06663
Read All

65/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL