Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Deep Learning for Human Affect Recognition: Insights and New Developments

2019-01-09

Philipp V. Rouast, Marc T. P. Adam, Raymond Chiong

arXiv_AI

arXiv_AI Review Deep_Learning Recognition
Abstract

Automatic human affect recognition is a key step towards more natural human-computer interaction. Recent trends include recognition in the wild using a fusion of audiovisual and physiological sensors, a challenging setting for conventional machine learning algorithms. Since 2010, novel deep learning algorithms have been applied increasingly in this field. In this paper, we review the literature on human affect recognition between 2010 and 2017, with a special focus on approaches using deep neural networks. By classifying a total of 950 studies according to their usage of shallow or deep architectures, we are able to show a trend towards deep learning. Reviewing a subset of 233 studies that employ deep neural networks, we comprehensively quantify their applications in this field. We find that deep learning is used for learning of (i) spatial feature representations, (ii) temporal feature representations, and (iii) joint feature representations for multimodal sensor data. Exemplary state-of-the-art architectures illustrate the progress. Our findings show the role deep architectures will play in human affect recognition, and can serve as a reference point for researchers working on related applications.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02884

PDF

https://arxiv.org/pdf/1901.02884
Read All
Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation

2019-01-09

He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, Leonidas J. Guibas

arXiv_CV

arXiv_CV Pose_Estimation Prediction
Abstract

The goal of this paper is to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image. Contrary to “instance-level” 6D pose estimation tasks, our problem assumes that no exact object CAD models are available during either training or testing time. To handle different and unseen object instances in a given category, we introduce a Normalized Object Coordinate Space (NOCS)—a shared canonical representation for all possible object instances within a category. Our region-based neural network is then trained to directly infer the correspondence from observed pixels to this shared object representation (NOCS) along with other object information such as class label and instance mask. These predictions can be combined with the depth map to jointly estimate the metric 6D pose and dimensions of multiple objects in a cluttered scene. To train our network, we present a new context-aware technique to generate large amounts of fully annotated mixed reality data. To further improve our model and evaluate its performance on real data, we also provide a fully annotated real-world dataset with large environment and instance variation. Extensive experiments demonstrate that the proposed method is able to robustly estimate the pose and size of unseen object instances in real environments while also achieving state-of-the-art performance on standard 6D pose estimation benchmarks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02970

PDF

https://arxiv.org/pdf/1901.02970
Read All
Composite Shape Modeling via Latent Space Factorization

2019-01-09

Anastasia Dubrovina, Fei Xia, Panos Achlioptas, Mira Shalah, Leonidas Guibas

arXiv_CV

arXiv_CV Embedding
Abstract

We present a novel neural network architecture, termed Decomposer-Composer, for semantic structure-aware 3D shape modeling. Our method utilizes an auto-encoder-based pipeline, and produces a novel factorized shape embedding space, where the semantic structure of the shape collection translates into a data-dependent sub-space factorization, and where shape composition and decomposition become simple linear operations on the embedding coordinates. We further propose to model shape assembly using an explicit learned part deformation module, which utilizes a 3D spatial transformer network to perform an in-network volumetric grid deformation, and which allows us to train the whole system end-to-end. The resulting network allows us to perform part-level shape manipulation, unattainable by existing approaches. Our extensive ablation study, comparison to baseline methods and qualitative analysis demonstrate the improved performance of the proposed method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02968

PDF

https://arxiv.org/pdf/1901.02968
Read All
Using stigmergy as a computational memory in the design of recurrent neural networks

2019-01-09

Federico A. Galatolo, Mario G. C. A. Cimino, Gigliola Vaglini

arXiv_CV

arXiv_CV Ontology RNN Classification Recognition
Abstract

In this paper, a novel architecture of Recurrent Neural Network (RNN) is designed and experimented. The proposed RNN adopts a computational memory based on the concept of stigmergy. The basic principle of a Stigmergic Memory (SM) is that the activity of deposit/removal of a quantity in the SM stimulates the next activities of deposit/removal. Accordingly, subsequent SM activities tend to reinforce/weaken each other, generating a coherent coordination between the SM activities and the input temporal stimulus. We show that, in a problem of supervised classification, the SM encodes the temporal input in an emergent representational model, by coordinating the deposit, removal and classification activities. This study lays down a basic framework for the derivation of a SM-RNN. A formal ontology of SM is discussed, and the SM-RNN architecture is detailed. To appreciate the computational power of an SM-RNN, comparative NNs have been selected and trained to solve the MNIST handwritten digits recognition benchmark in its two variants: spatial (sequences of bitmap rows) and temporal (sequences of pen strokes).

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01341

PDF

https://arxiv.org/pdf/1903.01341
Read All
DASPS: A Database for Anxious States based on a Psychological Stimulation

2019-01-09

Asma Baghdadi, Yassine Aribi, Rahma Fourati, Najla Halouani, Patrick Siarry, Adel M. Alimi

arXiv_CV

arXiv_CV Sparse Face Detection
Abstract

Anxiety affects human capabilities and behavior as much as it affects productivity and quality of life. It can be considered as the main cause of depression and suicide. Anxious states are easily detectable by humans due to their acquired cognition, humans interpret the interlocutor’s tone of speech, gesture, facial expressions and recognize their mental state. There is a need for non-invasive reliable techniques that performs the complex task of anxiety detection. In this paper, we present DASPS database containing recorded Electroencephalogram (EEG) signals of 23 participants during anxiety elicitation by means of face-to-face psychological stimuli. EEG signals were captured with Emotiv Epoc headset as it’s a wireless wearable low-cost equipment. In our study, we investigate the impact of different parameters, notably: trial duration, feature type, feature combination and anxiety levels number. Our findings showed that anxiety is well elicited in 1 second. For instance, stacked sparse autoencoder with different type of features achieves 83.50% and 74.60% for 2 and 4 anxiety levels detection, respectively. The presented results prove the benefits of the use of a low-cost EEG headset instead of medical non-wireless devices and create a starting point for new researches in the field of anxiety detection.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02942

PDF

https://arxiv.org/pdf/1901.02942
Read All
SalSi: A new seismic attribute for salt dome detection

2019-01-09

Muhammad Amir Shafiq, Tariq Alshawi, Zhiling Long, Ghassan AlRegib

arXiv_CV

arXiv_CV Salient Attention Detection
Abstract

In this paper, we propose a saliency-based attribute, SalSi, to detect salt dome bodies within seismic volumes. SalSi is based on the saliency theory and modeling of the human vision system (HVS). In this work, we aim to highlight the parts of the seismic volume that receive highest attention from the human interpreter, and based on the salient features of a seismic image, we detect the salt domes. Experimental results show the effectiveness of SalSi on the real seismic dataset acquired from the North Sea, F3 block. Subjectively, we have used the ground truth and the output of different salt dome delineation algorithms to validate the results of SalSi. For the objective evaluation of results, we have used the receiver operating characteristics (ROC) curves and area under the curves (AUC) to demonstrate SalSi is a promising and an effective attribute for seismic interpretation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02937

PDF

https://arxiv.org/pdf/1901.02937
Read All
Optimization driven kinematic control of constrained collaborative mobile agents with high mobility

2019-01-09

Nitish Kumar, Stelian Coros

arXiv_RO

arXiv_RO Optimization
Abstract

Industrial robots, in particular serial industrial manipulators, have enabled a lot of recent research in large scale robotic systems such as used in construction robotics, robotics in architecture. However, industrial manipulators have very low payload to weight ratio, are too generic systems with inflexible hardware and software for task adaptability. High mobility large scale robotic systems often involve even heavier mobile bases to move around these industrial manipulators, thus even further lowering the payload to weight ratio of such systems. Moreover, such system architecture is inflexible and eventually reaches its limits when higher mobility is demanded such as higher overall reach with same payload. This paper presents a concept of constrained collaborative mobile agents where the actuated mobile agents are constrained by a passive kinematic structure whose topology can be inexpensively configured according to different functions, task and mobility requirements. The type and number of mobile agents, the choice of actuation scheme are other important characteristic of this system which can be altered to improve system performance. A novel optimization framework for modeling and kinematic control of such systems is presented which is flexible to the above-mentioned system elements and characteristics. Finally, two prototypes are presented which are used to demonstrate the optimization driven kinematic control of the system with different topologies.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02935

PDF

https://arxiv.org/pdf/1901.02935
Read All
What caused what? A quantitative account of actual causation using dynamical causal networks

2019-01-09

Larissa Albantakis, William Marshall, Erik Hoel, Giulio Tononi

arXiv_AI

arXiv_AI Knowledge Quantitative
Abstract

Actual causation is concerned with the question “what caused what?” Consider a transition between two states within a system of interacting elements, such as an artificial neural network, or a biological brain circuit. Which combination of synapses caused the neuron to fire? Which image features caused the classifier to misinterpret the picture? Even detailed knowledge of the system’s causal network, its elements, their states, connectivity, and dynamics does not automatically provide a straightforward answer to the “what caused what?” question. Counterfactual accounts of actual causation based on graphical models, paired with system interventions, have demonstrated initial success in addressing specific problem cases in line with intuitive causal judgments. Here, we start from a set of basic requirements for causation (realization, composition, information, integration, and exclusion) and develop a rigorous, quantitative account of actual causation that is generally applicable to discrete dynamical systems. We present a formal framework to evaluate these causal requirements that is based on system interventions and partitions, and considers all counterfactuals of a state transition. This framework is used to provide a complete causal account of the transition by identifying and quantifying the strength of all actual causes and effects linking the two consecutive system states. Finally, we examine several exemplary cases and paradoxes of causation and show that they can be illuminated by the proposed framework for quantifying actual causation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1708.06716

PDF

http://arxiv.org/pdf/1708.06716
Read All
Trace-back Along Capsules and Its Application on Semantic Segmentation

2019-01-09

Tao Sun, Zhewei Wang, C. D. Smith, Jundong Liu

arXiv_AI

arXiv_AI Segmentation CNN Semantic_Segmentation
Abstract

In this paper, we propose a capsule-based neural network model to solve the semantic segmentation problem. By taking advantage of the extractable part-whole dependencies available in capsule layers, we derive the probabilities of the class labels for individual capsules through a recursive, layer-by-layer procedure. We model this procedure as a traceback pipeline and take it as a central piece to build an end-to-end segmentation network. Under the proposed framework, image-level class labels and object boundaries are jointly sought in an explicit manner, which poses a significant advantage over the state-of-the-art fully convolutional network (FCN) solutions. Experiments conducted on modified MNIST and neuroimages demonstrate that our model considerably enhance the segmentation performance compared to the leading FCN variant.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02920

PDF

https://arxiv.org/pdf/1901.02920
Read All
Making AI meaningful again

2019-01-09

Jobst Landgrebe, Barry Smith

arXiv_AI

arXiv_AI Attention
Abstract

Artificial intelligence (AI) research enjoyed an initial period of enthusiasm in the 1970s and 80s. But this enthusiasm was tempered by a long interlude of frustration when genuinely useful AI applications failed to be forthcoming. Today, we are experiencing once again a period of enthusiasm, fired above all by the successes of the technology of deep neural networks or deep machine learning. In this paper we draw attention to what we take to be serious problems underlying current views of artificial intelligence encouraged by these successes, especially in the domain of language processing. We then show an alternative approach to language-centric AI, in which we identify a role for philosophy.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02918

PDF

https://arxiv.org/pdf/1901.02918
Read All
Revealing interpretable object representations from human behavior

2019-01-09

Charles Y. Zheng, Francisco Pereira, Chris I. Baker, Martin N. Hebart

arXiv_CV

arXiv_CV Sparse Embedding
Abstract

To study how mental object representations are related to behavior, we estimated sparse, non-negative representations of objects using human behavioral judgments on images representative of 1,854 object categories. These representations predicted a latent similarity structure between objects, which captured most of the explainable variance in human behavioral judgments. Individual dimensions in the low-dimensional embedding were found to be highly reproducible and interpretable as conveying degrees of taxonomic membership, functionality, and perceptual attributes. We further demonstrated the predictive power of the embeddings for explaining other forms of human behavior, including categorization, typicality judgments, and feature ratings, suggesting that the dimensions reflect human conceptual representations of objects beyond the specific task.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02915

PDF

https://arxiv.org/pdf/1901.02915
Read All
Myocardial Infarction Quantification From Late Gadolinium Enhancement MRI Using Top-hat Transforms and Neural Networks

2019-01-09

Ezequiel de la Rosa, Désiré Sidibé, Thomas Decourselle, Thibault Leclercq, Alexandre Cochet, Alain Lalande

arXiv_CV

arXiv_CV Segmentation CNN Classification Detection
Abstract

Significance: Late gadolinium enhanced magnetic resonance imaging (LGE-MRI) is the gold standard technique for myocardial viability assessment. Although the technique accurately reflects the damaged tissue, there is no clinical standard for quantifying myocardial infarction (MI), demanding most algorithms to be expert dependent. Objectives and Methods: In this work a new automatic method for MI quantification from LGE-MRI is proposed. Our novel segmentation approach is devised for accurately detecting not only hyper-enhanced lesions, but also microvascular-obstructed areas. Moreover, it includes a myocardial disease detection step which extends the algorithm for working under healthy scans. The method is based on a cascade approach where firstly, diseased slices are identified by a convolutional neural network (CNN). Secondly, by means of morphological operations a fast coarse scar segmentation is obtained. Thirdly, the segmentation is refined by a boundary-voxel reclassification strategy using an ensemble of CNNs. For its validation, reproducibility and further comparison against other methods, we tested the method on a big multi-field expert annotated LGE-MRI database including healthy and diseased cases. Results and Conclusion: In an exhaustive comparison against nine reference algorithms, the proposal achieved state-of-the-art segmentation performances and showed to be the only method agreeing in volumetric scar quantification with the expert delineations. Moreover, the method was able to reproduce the intra- and inter-observer variability ranges. It is concluded that the method could suitably be transferred to clinical scenarios.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02911

PDF

https://arxiv.org/pdf/1901.02911
Read All
Amortized Inference Regularization

2019-01-09

Rui Shu, Hung H. Bui, Shengjia Zhao, Mykel J. Kochenderfer, Stefano Ermon

arXiv_AI

arXiv_AI Regularization Represenation_Learning Inference
Abstract

The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.08913

PDF

http://arxiv.org/pdf/1805.08913
Read All
Learning to Infer and Execute 3D Shape Programs

2019-01-09

Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

arXiv_AI

arXiv_AI Relation Recognition
Abstract

Human perception of 3D shapes goes beyond reconstructing them as a set of points or a composition of geometric primitives: we also effortlessly understand higher-level shape structure such as the repetition and reflective symmetry of object parts. In contrast, recent advances in 3D shape sensing focus more on low-level geometry but less on these higher-level relationships. In this paper, we propose 3D shape programs, integrating bottom-up recognition systems with top-down, symbolic program structure to capture both low-level geometry and high-level structural priors for 3D shapes. Because there are no annotations of shape programs for real shapes, we develop neural modules that not only learn to infer 3D shape programs from raw, unannotated shapes, but also to execute these programs for shape reconstruction. After initial bootstrapping, our end-to-end differentiable model learns 3D shape programs by reconstructing shapes in a self-supervised manner. Experiments demonstrate that our model accurately infers and executes 3D shape programs for highly complex shapes from various categories. It can also be integrated with an image-to-shape module to infer 3D shape programs directly from an RGB image, leading to 3D shape reconstructions that are both more accurate and more physically plausible.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02875

PDF

http://arxiv.org/pdf/1901.02875
Read All
Fuzzy neural networks to create an expert system for detecting attacks by SQL Injection

2019-01-09

Lucas Oliveira Batista, Gabriel Adriano de Silva, Vanessa Souza Araújo, Vinícius Jonathan Silva Araújo, Thiago Silva Rezende, Augusto Junio Guimarães, Paulo Vitor de Campos Souza

arXiv_AI

arXiv_AI Classification
Abstract

Its constant technological evolution characterizes the contemporary world, and every day the processes, once manual, become computerized. Data are stored in the cyberspace, and as a consequence, one must increase the concern with the security of this environment. Cyber-attacks are represented by a growing worldwide scale and are characterized as one of the significant challenges of the century. This article aims to propose a computational system based on intelligent hybrid models, which through fuzzy rules allows the construction of expert systems in cybernetic data attacks, focusing on the SQL Injection attack. The tests were performed with real bases of SQL Injection attacks on government computers, using fuzzy neural networks. According to the results obtained, the feasibility of constructing a system based on fuzzy rules, with the classification accuracy of cybernetic invasions within the margin of the standard deviation (compared to the state-of-the-art model in solving this type of problem) is real. The model helps countries prepare to protect their data networks and information systems, as well as create opportunities for expert systems to automate the identification of attacks in cyberspace.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02868

PDF

http://arxiv.org/pdf/1901.02868
Read All
Factorized Machine Self-Confidence for Decision-Making Agents

2019-01-09

Brett W Israelsen, Nisar R Ahmed, Eric Frew, Dale Lawrence, Brian Argrow

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Algorithmic assurances from advanced autonomous systems assist human users in understanding, trusting, and using such systems appropriately. Designing these systems with the capacity of assessing their own capabilities is one approach to creating an algorithmic assurance. The idea of machine self-confidence' is introduced for autonomous systems. Using a factorization based framework for self-confidence assessment, one component of self-confidence, called solver-quality’, is discussed in the context of Markov decision processes for autonomous systems. Markov decision processes underlie much of the theory of reinforcement learning, and are commonly used for planning and decision making under uncertainty in robotics and autonomous systems. A `solver quality’ metric is formally defined in the context of decision making algorithms based on Markov decision processes. A method for assessing solver quality is then derived, drawing inspiration from empirical hardness models. Finally, numerical experiments for an unmanned autonomous vehicle navigation problem under different solver, parameter, and environment conditions indicate that the self-confidence metric exhibits the desired properties. Discussion of results, and avenues for future investigation are included.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.06519

PDF

http://arxiv.org/pdf/1810.06519
Read All
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

2019-01-09

Zihang Dai, Zhilin Yang, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov

arXiv_CL

arXiv_CL RNN Language_Model
Abstract

Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, \textit{Transformer-XL}, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence. Concretely, it consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the problem of context fragmentation. As a result, Transformer-XL learns dependency that is about 80\% longer than RNNs and 450\% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformer during evaluation. Additionally, we improve the state-of-the-art (SoTA) results of bpc/perplexity from 1.06 to 0.99 on enwiki8, from 1.13 to 1.08 on text8, from 20.5 to 18.3 on WikiText-103, from 23.7 to 21.8 on One Billion Word, and from 55.3 to 54.5 on Penn Treebank (without finetuning). Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02860

PDF

http://arxiv.org/pdf/1901.02860
Read All
Adaptive Feature Processing for Robust Human Activity Recognition on a Novel Multi-Modal Dataset

2019-01-09

Mirco Moencks, Varuna De Silva, Jamie Roche, Ahmet Kondoz

arXiv_CV

arXiv_CV Review CNN Classification Detection Recognition
Abstract

Human Activity Recognition (HAR) is a key building block of many emerging applications such as intelligent mobility, sports analytics, ambient-assisted living and human-robot interaction. With robust HAR, systems will become more human-aware, leading towards much safer and empathetic autonomous systems. While human pose detection has made significant progress with the dawn of deep convolutional neural networks (CNNs), the state-of-the-art research has almost exclusively focused on a single sensing modality, especially video. However, in safety critical applications it is imperative to utilize multiple sensor modalities for robust operation. To exploit the benefits of state-of-the-art machine learning techniques for HAR, it is extremely important to have multimodal datasets. In this paper, we present a novel, multi-modal sensor dataset that encompasses nine indoor activities, performed by 16 participants, and captured by four types of sensors that are commonly used in indoor applications and autonomous vehicles. This multimodal dataset is the first of its kind to be made openly available and can be exploited for many applications that require HAR, including sports analytics, healthcare assistance and indoor intelligent mobility. We propose a novel data preprocessing algorithm to enable adaptive feature extraction from the dataset to be utilized by different machine learning algorithms. Through rigorous experimental evaluations, this paper reviews the performance of machine learning approaches to posture recognition, and analyses the robustness of the algorithms. When performing HAR with the RGB-Depth data from our new dataset, machine learning algorithms such as a deep neural network reached a mean accuracy of up to 96.8% for classification across all stationary and dynamic activities

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02858

PDF

http://arxiv.org/pdf/1901.02858
Read All
GIF2Video: Color Dequantization and Temporal Interpolation of GIF images

2019-01-09

Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai

arXiv_CV

arXiv_CV Face
Abstract

Graphics Interchange Format (GIF) is a highly portable graphics format that is ubiquitous on the Internet. Despite their small sizes, GIF images often contain undesirable visual artifacts such as flat color regions, false contours, color shift, and dotted patterns. In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild. We focus on the challenging task of GIF restoration by recovering information lost in the three steps of GIF creation: frame sampling, color quantization, and color dithering. We first propose a novel CNN architecture for color dequantization. It is built upon a compositional architecture for multi-step color correction, with a comprehensive loss function designed to handle large quantization errors. We then adapt the SuperSlomo network for temporal interpolation of GIF frames. We introduce two large datasets, namely GIF-Faces and GIF-Moments, for both training and evaluation. Experimental results show that our method can significantly improve the visual quality of GIFs, and outperforms direct baseline and state-of-the-art approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02840

PDF

http://arxiv.org/pdf/1901.02840
Read All
SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild

2019-01-09

Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Bjorn Schuller, Kam Star, Elnar Hajiyev, Maja Pantic

arXiv_AI

arXiv_AI Sentiment Detection
Abstract

Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are becoming indispensable part of our life more and more. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2000 minutes of audio-visual data of 398 people coming from six cultures, 50% female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal and (dis)liking intensity estimation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02839

PDF

http://arxiv.org/pdf/1901.02839
Read All
Learnable Manifold Alignment : A Semi-supervised Cross-modality Learning Framework for Land Cover and Land Use Classification

2019-01-09

Danfeng Hong, Naoto Yokoya, Nan Ge, Jocelyn Chanussot, Xiao Xiang Zhu

arXiv_CV

arXiv_CV Optimization Classification
Abstract

In this paper, we aim at tackling a general but interesting cross-modality feature learning question in remote sensing community — can a limited amount of highly-discrimin-ative (e.g., hyperspectral) training data improve the performance of a classification task using a large amount of poorly-discriminative (e.g., multispectral) data? Traditional semi-supervised manifold alignment methods do not perform sufficiently well for such problems, since the hyperspectral data is very expensive to be largely collected in a trade-off between time and efficiency, compared to the multispectral data. To this end, we propose a novel semi-supervised cross-modality learning framework, called learnable manifold alignment (LeMA). LeMA learns a joint graph structure directly from the data instead of using a given fixed graph defined by a Gaussian kernel function. With the learned graph, we can further capture the data distribution by graph-based label propagation, which enables finding a more accurate decision boundary. Additionally, an optimization strategy based on the alternating direction method of multipliers (ADMM) is designed to solve the proposed model. Extensive experiments on two hyperspectral-multispectral datasets demonstrate the superiority and effectiveness of the proposed method in comparison with several state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02838

PDF

http://arxiv.org/pdf/1901.02838
Read All
Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

2019-01-09

Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

arXiv_AI

arXiv_AI Reinforcement_Learning Represenation_Learning Quantitative
Abstract

We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning. In such hierarchical structures, a higher-level controller solves tasks by iteratively communicating goals which a lower-level policy is trained to reach. Accordingly, the choice of representation – the mapping of observation space to goal space – is crucial. To study this problem, we develop a notion of sub-optimality of a representation, defined in terms of expected reward of the optimal hierarchical policy using this representation. We derive expressions which bound the sub-optimality and show how these expressions can be translated to representation learning objectives which may be optimized in practice. Results on a number of difficult continuous-control tasks show that our approach to representation learning yields qualitatively better representations as well as quantitatively better hierarchical policies, compared to existing methods (see videos at https://sites.google.com/view/representation-hrl).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.01257

PDF

http://arxiv.org/pdf/1810.01257
Read All
The Use of Mutual Coherence to Prove $ell^1/ell^0$-Equivalence in Classification Problems

2019-01-09

Chelsea Weaver, Naoki Saito

arXiv_CV

arXiv_CV Sparse Classification Relation
Abstract

We consider the decomposition of a signal over an overcomplete set of vectors. Minimization of the $\ell^1$-norm of the coefficient vector can often retrieve the sparsest solution (so-called “$\ell^1/\ell^0$-equivalence”), a generally NP-hard task, and this fact has powered the field of compressed sensing. Wright et al.’s sparse representation-based classification (SRC) applies this relationship to machine learning, wherein the signal to be decomposed represents the test sample and columns of the dictionary are training samples. We investigate the relationships between $\ell^1$-minimization, sparsity, and classification accuracy in SRC. After proving that the tractable, deterministic approach to verifying $\ell^1/\ell^0$-equivalence fundamentally conflicts with the high coherence between same-class training samples, we demonstrate that $\ell^1$-minimization can still recover the sparsest solution when the classes are well-separated. Further, using a nonlinear transform so that sparse recovery conditions may be satisfied, we demonstrate that approximate (not strict) equivalence is key to the success of SRC.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02783

PDF

http://arxiv.org/pdf/1901.02783
Read All
Sentiment Analysis of Czech Texts: An Algorithmic Survey

2019-01-09

Erion Çano, Ondřej Bojar

arXiv_CL

arXiv_CL Sentiment Review Attention Face Survey
Abstract

In the area of online communication, commerce and transactions, analyzing sentiment polarity of texts written in various natural languages has become crucial. While there have been a lot of contributions in resources and studies for the English language, “smaller” languages like Czech have not received much attention. In this survey, we explore the effectiveness of many existing machine learning algorithms for sentiment analysis of Czech Facebook posts and product reviews. We report the sets of optimal parameter values for each algorithm and the scores in both datasets. We finally observe that support vector machines are the best classifier and efforts to increase performance even more with bagging, boosting or voting ensemble schemes fail to do so.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02780

PDF

http://arxiv.org/pdf/1901.02780
Read All
InstaNAS: Instance-aware Neural Architecture Search

2019-01-09

An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, Min Sun

arXiv_CV

arXiv_CV NAS Inference Classification
Abstract

Neural Architecture Search (NAS) aims at finding one “single” architecture that achieves the best accuracy for a given task such as image this http URL this paper, we study the instance-level variation,and demonstrate that instance-awareness is an important yet currently missing component of NAS. Based on this observation, we propose InstaNAS for searching toward instance-level architectures;the controller is trained to search and form a “distribution of architectures” instead of a single final architecture. Then during the inference phase, the controller selects an architecture from the distribution, tailored for each unseen image to achieve both high accuracy and short latency. The experimental results show that InstaNAS reduces the inference latency without compromising classification accuracy. On average, InstaNAS achieves 48.9% latency reduction on CIFAR-10 and 40.2% latency reduction on CIFAR-100 with respect to MobileNetV2 architecture.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.10201

PDF

https://arxiv.org/pdf/1811.10201
Read All
On Finding Gray Pixels

2019-01-09

Yanlin Qian, Jarno Nikkanen, Joni-Kristian Kämäräinen, Jiri Matas

arXiv_CV

arXiv_CV
Abstract

We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation. The grayness index, GI in short, is derived using the Dichromatic Reflection Model and is learning-free. The proposed GI allows estimating one or multiple illumination sources in color-biased images. On standard single-illumination and multiple-illumination estimation benchmarks, GI outperforms state-of-the-art statistical methods and many recent deep net methods. GI is simple and fast, written in a few dozen lines, processing a 1080p image in about 0.4 seconds with a non-optimized Matlab code.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.03198

PDF

https://arxiv.org/pdf/1901.03198
Read All
Multi-Label Learning with Label Enhancement

2019-01-09

Ruifeng Shao, Ning Xu, Xin Geng

arXiv_CV

arXiv_CV
Abstract

The task of multi-label learning is to predict a set of relevant labels for the unseen instance. Traditional multi-label learning algorithms treat each class label as a logical indicator of whether the corresponding label is relevant or irrelevant to the instance, i.e., +1 represents relevant to the instance and -1 represents irrelevant to the instance. Such label represented by -1 or +1 is called logical label. Logical label cannot reflect different label importance. However, for real-world multi-label learning problems, the importance of each possible label is generally different. For the real applications, it is difficult to obtain the label importance information directly. Thus we need a method to reconstruct the essential label importance from the logical multilabel data. To solve this problem, we assume that each multi-label instance is described by a vector of latent real-valued labels, which can reflect the importance of the corresponding labels. Such label is called numerical label. The process of reconstructing the numerical labels from the logical multi-label data via utilizing the logical label information and the topological structure in the feature space is called Label Enhancement. In this paper, we propose a novel multi-label learning framework called LEMLL, i.e., Label Enhanced Multi-Label Learning, which incorporates regression of the numerical labels and label enhancement into a unified framework. Extensive comparative studies validate that the performance of multi-label learning can be improved significantly with label enhancement and LEMLL can effectively reconstruct latent label importance information from logical multi-label data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1706.08323

PDF

http://arxiv.org/pdf/1706.08323
Read All
Transfer Representation Learning with TSK Fuzzy System

2019-01-09

Peng Xu, Zhaohong Deng, Jun Wang, Qun Zhang, Shitong Wang

arXiv_AI

arXiv_AI Transfer_Learning Represenation_Learning
Abstract

Transfer learning can address the learning tasks of unlabeled data in the target domain by leveraging plenty of labeled data from a different but related source domain. A core issue in transfer learning is to learn a shared feature space in where the distributions of the data from two domains are matched. This learning process can be named as transfer representation learning (TRL). The feature transformation methods are crucial to ensure the success of TRL. The most commonly used feature transformation method in TRL is kernel-based nonlinear mapping to the high-dimensional space followed by linear dimensionality reduction. But the kernel functions are lack of interpretability and are difficult to be selected. To this end, the TSK fuzzy system (TSK-FS) is combined with transfer learning and a more intuitive and interpretable modeling method, called transfer representation learning with TSK-FS (TRL-TSK-FS) is proposed in this paper. Specifically, TRL-TSK-FS realizes TRL from two aspects. On one hand, the data in the source and target domains are transformed into the fuzzy feature space in which the distribution distance of the data between two domains is min-imized. On the other hand, discriminant information and geo-metric properties of the data are preserved by linear discriminant analysis and principal component analysis. In addition, another advantage arises with the proposed method, that is, the nonlinear transformation is realized by constructing fuzzy mapping with the antecedent part of the TSK-FS instead of kernel functions which are difficult to be selected. Extensive experiments are conducted on the text and image datasets. The results obviously show the superiority of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02703

PDF

http://arxiv.org/pdf/1901.02703
Read All
Guess What's on my Screen? Clustering Smartphone Screenshots with Active Learning

2019-01-09

Agnese Chiatti, Dolzodmaa Davaasuren, Nilam Ram, Prasenjit Mitra

arXiv_CV

arXiv_CV Face Classification
Abstract

A significant proportion of individuals’ daily activities is experienced through digital devices. Smartphones in particular have become one of the preferred interfaces for content consumption and social interaction. Identifying the content embedded in frequently-captured smartphone screenshots is thus a crucial prerequisite to studies of media behavior and health intervention planning that analyze activity interplay and content switching over time. Screenshot images can depict heterogeneous contents and applications, making the a priori definition of adequate taxonomies a cumbersome task, even for humans. Privacy protection of the sensitive data captured on screens means the costs associated with manual annotation are large, as the effort cannot be crowd-sourced. Thus, there is need to examine utility of unsupervised and semi-supervised methods for digital screenshot classification. This work introduces the implications of applying clustering on large screenshot sets when only a limited amount of labels is available. In this paper we develop a framework for combining K-Means clustering with Active Learning for efficient leveraging of labeled and unlabeled samples, with the goal of discovering latent classes and describing a large collection of screenshot data. We tested whether SVM-embedded or XGBoost-embedded solutions for class probability propagation provide for more well-formed cluster configurations. Visual and textual vector representations of the screenshot images are derived and combined to assess the relative contribution of multi-modal features to the overall performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02701

PDF

http://arxiv.org/pdf/1901.02701
Read All
Image Recognition of Tea Leaf Diseases Based on Convolutional Neural Network

2019-01-09

Xiaoxiao Sun, Shaomin Mu, Yongyu Xu, Zhihao Cao, Tingting Su

arXiv_CV

arXiv_CV Segmentation CNN Classification Recognition
Abstract

In order to identify and prevent tea leaf diseases effectively, convolution neural network (CNN) was used to realize the image recognition of tea disease leaves. Firstly, image segmentation and data enhancement are used to preprocess the images, and then these images were input into the network for training. Secondly, to reach a higher recognition accuracy of CNN, the learning rate and iteration numbers were adjusted frequently and the dropout was added properly in the case of over-fitting. Finally, the experimental results show that the recognition accuracy of CNN is 93.75%, while the accuracy of SVM and BP neural network is 89.36% and 87.69% respectively. Therefore, the recognition algorithm based on CNN is better in classification and can improve the recognition efficiency of tea leaf diseases effectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02694

PDF

http://arxiv.org/pdf/1901.02694
Read All
Low-Cost Transfer Learning of Face Tasks

2019-01-09

Thrupthi Ann John, Isha Dua, Vineeth N Balasubramanian, C. V. Jawahar

arXiv_CV

arXiv_CV Knowledge Face Transfer_Learning Recognition Face_Recognition
Abstract

Do we know what the different filters of a face network represent? Can we use this filter information to train other tasks without transfer learning? For instance, can age, head pose, emotion and other face related tasks be learned from face recognition network without transfer learning? Understanding the role of these filters allows us to transfer knowledge across tasks and take advantage of large data sets in related tasks. Given a pretrained network, we can infer which tasks the network generalizes for and the best way to transfer the information to a new task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02675

PDF

http://arxiv.org/pdf/1901.02675
Read All
Dynamic Markers: UAV landing proof of concept

2019-01-09

Raul Acuna, Volker Willert

arXiv_RO

arXiv_RO Detection
Abstract

In this paper, we introduce a dynamic fiducial marker which can change its appearance according to the spatiotemporal requirements of the visual perception task of a mobile robot using a camera as the sensor. We present a control scheme to dynamically change the appearance of the marker in order to increase the range of detection and to assure a better accuracy on the close range. The marker control takes into account the camera to marker distance (which influences the scale of the marker in image coordinates) to select which fiducial markers to display. Hence, we realize a tight coupling between the visual pose control of the mobile robot and the appearance of the dynamic fiducial marker. Additionally, we discuss the practical implications of time delays due to processing time and communication delays between the robot and the marker. Finally, we propose a real-time dynamic marker visual servoing control scheme for quadcopter landing and evaluate the performance on a real-world example.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1709.04981

PDF

http://arxiv.org/pdf/1709.04981
Read All
Insights into the robustness of control point configurations for homography and planar pose estimation

2019-01-09

Raul Acuna, Volker Willert

arXiv_CV

arXiv_CV Pose_Estimation
Abstract

In this paper, we investigate the influence of the spatial configuration of a number of $n \geq 4$ control points on the accuracy and robustness of space resection methods, e.g. used by a fiducial marker for pose estimation. We find robust configurations of control points by minimizing the first order perturbed solution of the DLT algorithm which is equivalent to minimizing the condition number of the data matrix. An empirical statistical evaluation is presented verifying that these optimized control point configurations not only increase the performance of the DLT homography estimation but also improve the performance of planar pose estimation methods like IPPE and EPnP, including the iterative minimization of the reprojection error which is the most accurate algorithm. We provide the characteristics of stable control point configurations for real-world noisy camera data that are practically independent on the camera pose and form certain symmetric patterns dependent on the number of points. Finally, we present a comparison of optimized configuration versus the number of control points.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.03025

PDF

http://arxiv.org/pdf/1803.03025
Read All
Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks

2019-01-09

Steffen Eger, Paul Youssef, Iryna Gurevych

arXiv_CL

arXiv_CL Image_Classification RNN Classification Deep_Learning
Abstract

Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning. One of the currently most popular activation functions is ReLU, but several competitors have recently been proposed or ‘discovered’, including LReLU functions and swish. While most works compare newly proposed activation functions on few tasks (usually from image classification) and against few competitors (usually ReLU), we perform the first large-scale comparison of 21 activation functions across eight different NLP tasks. We find that a largely unknown activation function performs most stably across all tasks, the so-called penalized tanh function. We also show that it can successfully replace the sigmoid and tanh gates in LSTM cells, leading to a 2 percentage point (pp) improvement over the standard choices on a challenging NLP task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02671

PDF

http://arxiv.org/pdf/1901.02671
Read All
Deep Semantic Multimodal Hashing Network for Scalable Multimedia Retrieval

2019-01-09

Lu Jin, Jinhui Tang, Zechao Li, Guo-Jun Qi, Fu Xiao

arXiv_CV

arXiv_CV Attention Embedding Represenation_Learning Classification Relation
Abstract

Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. Particularly, deep hashing has received unprecedented research attention in recent years, owing to its perfect retrieval performance. However, most of existing deep hashing methods learn binary hash codes by preserving the similarity relationship while without exploiting the semantic labels, which result in suboptimal binary codes. In this work, we propose a novel Deep Semantic Multimodal Hashing Network (DSMHN) for scalable multimodal retrieval. In DSMHN, two sets of modality-specific hash functions are jointly learned by explicitly preserving both the inter-modality similarities and the intra-modality semantic labels. Specifically, with the assumption that the learned hash codes should be optimal for task-specific classification, two stream networks are jointly trained to learn the hash functions by embedding the semantic labels on the resultant hash codes. Different from previous deep hashing methods, which are tied to some particular forms of loss functions, our deep hashing framework can be flexibly integrated with different types of loss functions. In addition, the bit balance property is investigated to generate binary codes with each bit having $50\%$ probability to be $1$ or $-1$. Moreover, a unified deep multimodal hashing framework is proposed to learn compact and high-quality hash codes by exploiting the feature representation learning, inter-modality similarity preserving learning, semantic label preserving learning and hash functions learning with bit balanced constraint simultaneously. We conduct extensive experiments for both unimodal and cross-modal retrieval tasks on three widely-used multimodal retrieval datasets. The experimental result demonstrates that DSMHN significantly outperforms state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02662

PDF

http://arxiv.org/pdf/1901.02662
Read All
Weakly Supervised Object Localization on grocery shelves using simple FCN and Synthetic Dataset

2019-01-09

Srikrishna Varadarajan, Muktabh Mayank Srivastava

arXiv_CV

arXiv_CV Object_Detection Segmentation Weakly_Supervised CNN Image_Classification Classification Detection
Abstract

We propose a weakly supervised method using two algorithms to predict object bounding boxes given only an image classification dataset. First algorithm is a simple Fully Convolutional Network (FCN) trained to classify object instances. We use the property of FCN to return a mask for images larger than training images to get a primary output segmentation mask during test time by passing an image pyramid to it. We enhance the FCN output mask into final output bounding boxes by a Convolutional Encoder-Decoder (ConvAE) viz. the second algorithm. ConvAE is trained to localize objects on an artificially generated dataset of output segmentation masks. We demonstrate the effectiveness of this method in localizing objects in grocery shelves where annotating data for object detection is hard due to variety of objects. This method can be extended to any problem domain where collecting images of objects is easy and annotating their coordinates is hard.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.06813

PDF

http://arxiv.org/pdf/1803.06813
Read All
What do Language Representations Really Represent?

2019-01-09

Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein

arXiv_CL

arXiv_CL Language_Model Relation
Abstract

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, while genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02646

PDF

http://arxiv.org/pdf/1901.02646
Read All
The Cross-Modality Disparity Problem in Multispectral Pedestrian Detection

2019-01-09

Lu Zhang, Zhiyong Liu, Xiangyu Chen, Xu Yang

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Aggregating extra features of novel modality brings great advantages for building robust pedestrian detector under adverse illumination conditions. However, misaligned imagery still persists in multispectral scenario and will depress the performance of detector in a non-trivial way. In this paper, we first present and explore the cross-modality disparity problem in multispectral pedestrian detection, providing insights into the utilization of multimodal inputs. Then, to further address this issue, we propose a novel framework including a region feature alignment module and the region of interest (RoI) jittering training strategy. Moreover, dense, high-quality, and modality-independent color-thermal annotation pairs are provided to scrub the large-scale KAIST dataset to benefit future multispectral detection research. Extensive experiments demonstrate that the proposed approach improves the robustness of detector with a large margin and achieves state-of-the-art performance with high efficiency. Code and data will be publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02645

PDF

http://arxiv.org/pdf/1901.02645
Read All
A Biologically Inspired Visual Working Memory for Deep Networks

2019-01-09

Ethan Harris, Mahesan Niranjan, Jonathon Hare

arXiv_CV

arXiv_CV Attention Classification Deep_Learning
Abstract

The ability to look multiple times through a series of pose-adjusted glimpses is fundamental to human vision. This critical faculty allows us to understand highly complex visual scenes. Short term memory plays an integral role in aggregating the information obtained from these glimpses and informing our interpretation of the scene. Computational models have attempted to address glimpsing and visual attention but have failed to incorporate the notion of memory. We introduce a novel, biologically inspired visual working memory architecture that we term the Hebb-Rosenblatt memory. We subsequently introduce a fully differentiable Short Term Attentive Working Memory model (STAWM) which uses transformational attention to learn a memory over each image it sees. The state of our Hebb-Rosenblatt memory is embedded in STAWM as the weights space of a layer. By projecting different queries through this layer we can obtain goal-oriented latent representations for tasks including classification and visual reconstruction. Our model obtains highly competitive classification performance on MNIST and CIFAR-10. As demonstrated through the CelebA dataset, to perform reconstruction the model learns to make a sequence of updates to a canvas which constitute a parts-based representation. Classification with the self supervised representation obtained from MNIST is shown to be in line with the state of the art models (none of which use a visual attention mechanism). Finally, we show that STAWM can be trained under the dual constraints of classification and reconstruction to provide an interpretable visual sketchpad which helps open the ‘black-box’ of deep learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.03665

PDF

http://arxiv.org/pdf/1901.03665
Read All
MOANA: An Online Learned Adaptive Appearance Model for Robust Multiple Object Tracking in 3D

2019-01-09

Zheng Tang, Jenq-Neng Hwang

arXiv_CV

arXiv_CV Re-identification Tracking Object_Tracking Detection
Abstract

Multiple object tracking has been a challenging field, mainly due to noisy detection sets and identity switch caused by occlusion and similar appearance among nearby targets. In this work, we propose an adaptive model that learns online a relatively long-term appearance change of each target. The proposed model is compatible with any feature of fixed dimension or their combination, whose learning rates are dynamically controlled by adaptive update and spatial weighting schemes. To handle occlusion and nearby objects sharing similar appearance, we also design cross-matching and re-identification schemes based on the application of the proposed adaptive appearance models. Additionally, the 3D geometry information is effectively incorporated in our formulation for data association. The proposed method outperforms all the state-of-the-art on the MOTChallenge 3D benchmark and achieves real-time computation with only a standard desktop CPU. It has also shown superior performance over the state-of-the-art on the 2D benchmark of MOTChallenge.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02626

PDF

http://arxiv.org/pdf/1901.02626
Read All
Extractive Summarization using Deep Learning

2019-01-09

Sukriti Verma, Vagisha Nidhi

arXiv_CL

arXiv_CL Summarization Deep_Learning
Abstract

This paper proposes a text summarization approach for factual reports using a deep learning model. This approach consists of three phases: feature extraction, feature enhancement, and summary generation, which work together to assimilate core information and generate a coherent, understandable summary. We are exploring various features to improve the set of sentences selected for the summary, and are using a Restricted Boltzmann Machine to enhance and abstract those features to improve resultant accuracy without losing any important information. The sentences are scored based on those enhanced features and an extractive summary is constructed. Experimentation carried out on several articles demonstrates the effectiveness of the proposed approach. Source code available at: https://github.com/vagisha-nidhi/TextSummarizer

Abstract (translated by Google)

URL

http://arxiv.org/abs/1708.04439

PDF

http://arxiv.org/pdf/1708.04439
Read All
Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

2019-01-09

Al-Hussein A. El-Shafie, Mohamed Zaki, Serag El-Din Habib

arXiv_CV

arXiv_CV Tracking CNN Object_Tracking
Abstract

Object trackers based on Convolution Neural Network (CNN) have achieved state-of-the-art performance on recent tracking benchmarks, while they suffer from slow computational speed. The high computational load arises from the extraction of the feature maps of the candidate and training patches in every video frame. The candidate and training patches are typically placed randomly around the previous target location and the estimated target location respectively. In this paper, we propose novel schemes to speed-up the processing of the CNN-based trackers. We input the whole region-of-interest once to the CNN to eliminate the redundant computations of the random candidate patches. In addition to classifying each candidate patch as an object or background, we adapt the CNN to classify the target location inside the object patches as a coarse localization step, and we employ bilinear interpolation for the CNN feature maps as a fine localization step. Moreover, bilinear interpolation is exploited to generate CNN feature maps of the training patches without actually forwarding the training patches through the network which achieves a significant reduction of the required computations. Our tracker does not rely on offline video training. It achieves competitive performance results on the OTB benchmark with 8x speed improvements compared to the equivalent tracker.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02620

PDF

http://arxiv.org/pdf/1901.02620
Read All
Sequential Attention-based Network for Noetic End-to-End Response Selection

2019-01-09

Qian Chen, Wen Wang

arXiv_CL

arXiv_CL Attention Classification
Abstract

The noetic end-to-end response selection challenge as one track in Dialog System Technology Challenges 7 (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context. This paper describes our systems that are ranked the top on both datasets under this challenge, one focused and small (Advising) and the other more diverse and large (Ubuntu). Previous state-of-the-art models use hierarchy-based (utterance-level and token-level) neural networks to explicitly model the interactions among different turns’ utterances for context modeling. In this paper, we investigate a sequential matching model based only on chain sequence for multi-turn response selection. Our results demonstrate that the potentials of sequential matching approaches have not yet been fully exploited in the past for multi-turn response selection. In addition to ranking the top in the challenge, the proposed model outperforms all previous models, including state-of-the-art hierarchy-based models, and achieves new state-of-the-art performances on two large-scale public multi-turn response selection benchmark datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02609

PDF

http://arxiv.org/pdf/1901.02609
Read All
Automated Driving Maneuvers under Interactive Environment based on Deep Reinforcement Learning

2019-01-09

Pin Wang, Ching-Yao Chan

arXiv_RO

arXiv_RO Adversarial Knowledge Tracking Reinforcement_Learning
Abstract

Safe and efficient autonomous driving maneuvers in an interactive and complex environment can be considerably challenging due to the unpredictable actions of other surrounding agents that may be cooperative or adversarial in their interactions with the ego vehicle. One of the state-of-the-art approaches is to apply Reinforcement Learning (RL) to learn a time-sequential driving policy, to execute proper control strategy or tracking trajectory in dynamic situations. However, direct application of RL algorithms is not satisfactorily enough to deal with the cases in the autonomous driving domain, mainly due to the complex driving environment and continuous action space. In this paper, we adopt Q-learning as our basic learning framework and design a unique format of the Q-function approximator that consists of neural networks to handle the continuous action space challenge. The learning model is present in a closed form of continuous control variables and trained in a simulation platform that we have developed with embedded properties of real-time vehicle interactions. The proposed algorithm avoids invoking an additional actor network that learns to take actions, as in actor-critic algorithms. At the same time, some prior knowledge of vehicle dynamics is also fed into the model to assist learning. We test our algorithm with a challenging use case - lane change maneuver, to verify the practicability and feasibility of the proposed approach. Results from accumulated rewards and vehicle performance show that RL vehicle agents successfully learn a safe, comfort and efficient driving policy as defined in the reward function.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.09200

PDF

http://arxiv.org/pdf/1803.09200
Read All
UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition

2019-01-09

Asanka G Perera, Yee Wei Law, Javaan Chahl

arXiv_CV

arXiv_CV Tracking Action_Recognition CNN Object_Tracking Recognition
Abstract

Current UAV-recorded datasets are mostly limited to action recognition and object tracking, whereas the gesture signals datasets were mostly recorded in indoor spaces. Currently, there is no outdoor recorded public video dataset for UAV commanding signals. Gesture signals can be effectively used with UAVs by leveraging the UAVs visual sensors and operational simplicity. To fill this gap and enable research in wider application areas, we present a UAV gesture signals dataset recorded in an outdoor setting. We selected 13 gestures suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. We provide 119 high-definition video clips consisting of 37151 frames. The overall baseline gesture recognition performance computed using Pose-based Convolutional Neural Network (P-CNN) is 91.9 %. All the frames are annotated with body joints and gesture classes in order to extend the dataset’s applicability to a wider research area including gesture recognition, action recognition, human pose recognition and situation awareness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02602

PDF

http://arxiv.org/pdf/1901.02602
Read All
CONet: A Cognitive Ocean Network

2019-01-09

Huimin Lu, Dong Wang, Yujie Li, Jianru Li, Xin Li, Hyoungseop Kim, Seiichi Serikawa, Iztok Humar

arXiv_AI

arXiv_AI Face
Abstract

The scientific and technological revolution of the Internet of Things has begun in the area of oceanography. Historically, humans have observed the ocean from an external viewpoint in order to study it. In recent years, however, changes have occurred in the ocean, and laboratories have been built on the seafloor. Approximately 70.8% of the Earth’s surface is covered by oceans and rivers. The Ocean of Things is expected to be important for disaster prevention, ocean-resource exploration, and underwater environmental monitoring. Unlike traditional wireless sensor networks, the Ocean Network has its own unique features, such as low reliability and narrow bandwidth. These features will be great challenges for the Ocean Network. Furthermore, the integration of the Ocean Network with artificial intelligence has become a topic of increasing interest for oceanology researchers. The Cognitive Ocean Network (CONet) will become the mainstream of future ocean science and engineering developments. In this article, we define the CONet. The contributions of the paper are as follows: (1) a CONet architecture is proposed and described in detail; (2) important and useful demonstration applications of the CONet are proposed; and (3) future trends in CONet research are presented.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06253

PDF

http://arxiv.org/pdf/1901.06253
Read All
D${}^3$TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

2019-01-09

Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, Juan Carlos Niebles

arXiv_CV

arXiv_CV Segmentation Weakly_Supervised
Abstract

We address weakly-supervised action alignment and segmentation in videos, where only the order of occurring actions is available during training. We propose Discriminative Differentiable Dynamic Time Warping (D${}^3$TW), which is the first discriminative model for weak ordering supervision. This allows us to bypass the degenerated sequence problem usually encountered in previous work. The key technical challenge for discriminative modeling with weak-supervision is that the loss function of the ordering supervision is usually formulated using dynamic programming and is thus not differentiable. We address this challenge by continuous relaxation of the min-operator in dynamic programming and extend the DTW alignment loss to be differentiable. The proposed D${}^3$TW innovatively solves sequence alignment with discriminative modeling and end-to-end training, which substantially improves the performance in weakly supervised action alignment and segmentation tasks. We show that our model outperforms the current state-of-the-art across three evaluation metrics in two challenging datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02598

PDF

http://arxiv.org/pdf/1901.02598
Read All
MSR: Multi-Scale Shape Regression for Scene Text Detection

2019-01-09

Chuhui Xue, Shijian Lu, Wei Zhang

arXiv_CV

arXiv_CV Sparse Detection Recognition
Abstract

State-of-the-art scene text detection techniques predict quadrilateral boxes which are prone to localization errors while dealing with long or curved text lines in scenes. This paper presents a novel multi-scale shape regression network (MSR) that is capable of locating scene texts of arbitrary orientations, shapes and lengths accurately. The MSR detects scene texts by predicting dense text boundary points instead of sparse quadrilateral vertices which often suffers from regression errors while dealing with long text lines. The detection by linking of dense boundary points also enables accurate localization of scene texts of arbitrary orientations and shapes whereas most existing techniques using quadrilaterals often include undesired background to the ensuing text recognition. Additionally, the multi-scale network extracts and fuses features at different scales concurrently and seamlessly which demonstrates superb tolerance to the text scale variation. Extensive experiments over several public datasets show that MSR obtains superior detection performance for both curved and arbitrarily oriented text lines of different lengths, e.g. 80.7 f-score for the CTW1500, 81.7 f-score for the MSRA-TD500, etc.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.02596

PDF

http://arxiv.org/pdf/1901.02596
Read All
Estimating Buildings' Parameters over Time Including Prior Knowledge

2019-01-09

Nilavra Pathak, James Foulds, Nirmalya Roy. Nilanjan Banerjee, Ryan Robucci

arXiv_AI

arXiv_AI Knowledge Transfer_Learning Inference
Abstract

Modeling buildings’ heat dynamics is a complex process which depends on various factors including weather, building thermal capacity, insulation preservation, and residents’ behavior. Gray-box models offer a causal inference of those dynamics expressed in few parameters specific to built environments. These parameters can provide compelling insights into the characteristics of building artifacts and have various applications such as forecasting HVAC usage, indoor temperature control monitoring of built environments, etc. In this paper, we present a systematic study of modeling buildings’ thermal characteristics and thus derive the parameters of built conditions with a Bayesian approach. We build a Bayesian state-space model that can adapt and incorporate buildings’ thermal equations and propose a generalized solution that can easily adapt prior knowledge regarding the parameters. We show that a faster approximate approach using variational inference for parameter estimation can provide similar parameters as that of a more time-consuming Markov Chain Monte Carlo (MCMC) approach. We perform extensive evaluations on two datasets to understand the generative process and show that the Bayesian approach is more interpretable. We further study the effects of prior selection for the model parameters and transfer learning, where we learn parameters from one season and use them to fit the model in the other. We perform extensive evaluations on controlled and real data traces to enumerate buildings’ parameter within a 95% credible interval.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.07469

PDF

http://arxiv.org/pdf/1901.07469
Read All
Individual common dolphin identification via metric embedding learning

2019-01-09

Soren Bouma, Matthew D. M. Pawley, Krista Hupman, Andrew Gilman

arXiv_CV

arXiv_CV Embedding Deep_Learning Recognition
Abstract

Photo-identification (photo-id) of dolphin individuals is a commonly used technique in ecological sciences to monitor state and health of individuals, as well as to study the social structure and distribution of a population. Traditional photo-id involves a laborious manual process of matching each dolphin fin photograph captured in the field to a catalogue of known individuals. We examine this problem in the context of open-set recognition and utilise a triplet loss function to learn a compact representation of fin images in a Euclidean embedding, where the Euclidean distance metric represents fin similarity. We show that this compact representation can be successfully learnt from a fairly small (in deep learning context) training set and still generalise well to out-of-sample identities (completely new dolphin individuals), with top-1 and top-5 test set (37 individuals) accuracy of $90.5\pm2$ and $93.6\pm1$ percent. In the presence of 1200 distractors, top-1 accuracy dropped by $12\%$; however, top-5 accuracy saw only a $2.8\%$ drop

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.03662

PDF

http://arxiv.org/pdf/1901.03662
Read All

192/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL