We develop a likelihood free inference procedure for conditioning a probabilistic model on a predicate. A predicate is a Boolean valued function which expresses a yes/no question about a domain. Our contribution, which we call predicate exchange, constructs a softened predicate which takes value in the unit interval [0, 1] as opposed to a simply true or false. Intuitively, 1 corresponds to true, and a high value (such as 0.999) corresponds to “nearly true” as determined by a distance metric. We define Boolean algebra for soft predicates, such that they can be negated, conjoined and disjoined arbitrarily. A softened predicate can serve as a tractable proxy to a likelihood function for approximate posterior inference. However, to target exact inference, we temper the relaxation by a temperature parameter, and add a accept/reject phase use to replica exchange Markov Chain Mont Carlo, which exchanges states between a sequence of models conditioned on predicates at varying temperatures. We describe a lightweight implementation of predicate exchange that it provides a language independent layer that can be implemented on top of existingn modeling formalisms.
http://arxiv.org/abs/1901.05437
In this paper we propose a new training loop for deep reinforcement learning agents with an evolutionary generator. Evolutionary procedural content generation has been used in the creation of maps and levels for games before. Our system incorporates an evolutionary map generator to construct a training curriculum that is evolved to maximize loss within the state-of-the-art Double Dueling Deep Q Network architecture with prioritized replay. We present a case-study in which we prove the efficacy of our new method on a game with a discrete, large action space we made called Attackers and Defenders. Our results demonstrate that training on an evolutionarily-curated curriculum (directed sampling) of maps both expedites training and improves generalization when compared to a network trained on an undirected sampling of maps.
http://arxiv.org/abs/1901.05431
Predicting structured outputs such as semantic segmentation relies on expensive per-pixel annotations to learn strong supervised models like convolutional neural networks. However, these models trained on one data domain may not generalize well to other domains unequipped with annotations for model finetuning. To avoid the labor-intensive process of annotation, we develop a domain adaptation method to adapt the source data to the unlabeled target domain. To this end, we propose to learn discriminative feature representations of patches based on label histograms in the source domain, through the construction of a clustered space. With such representations as guidance, we then use an adversarial learning scheme to push the feature representations in target patches to the closer distributions in source ones. In addition, we show that our framework can integrate a global alignment process with the proposed patch-level alignment and achieve state-of-the-art performance on semantic segmentation. Extensive ablation studies and experiments are conducted on numerous benchmark datasets with various settings, such as synthetic-to-real and cross-city scenarios.
http://arxiv.org/abs/1901.05427
The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user’s responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot’s dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.
http://arxiv.org/abs/1901.05415
The socioeconomic status of people depends on a combination of individual characteristics and environmental variables, thus its inference from online behavioral data is a difficult task. Attributes like user semantics in communication, habitat, occupation, or social network are all known to be determinant predictors of this feature. In this paper we propose three different data collection and combination methods to first estimate and, in turn, infer the socioeconomic status of French Twitter users from their online semantics. Our methods are based on open census data, crawled professional profiles, and remotely sensed, expert annotated information on living environment. Our inference models reach similar performance of earlier results with the advantage of relying on broadly available datasets and of providing a generalizable framework to estimate socioeconomic status of large numbers of Twitter users. These results may contribute to the scientific discussion on social stratification and inequalities, and may fuel several applications.
http://arxiv.org/abs/1901.05389
Accounting for 26% of all new cancer cases worldwide, breast cancer remains the most common form of cancer in women. Although early breast cancer has a favourable long-term prognosis, roughly a third of patients suffer from a suboptimal aesthetic outcome despite breast conserving cancer treatment. Clinical-quality 3D modelling of the breast surface therefore assumes an increasingly important role in advancing treatment planning, prediction and evaluation of breast cosmesis. Yet, existing 3D torso scanners are expensive and either infrastructure-heavy or subject to motion artefacts. In this paper we employ a single consumer-grade RGBD camera with an ICP-based registration approach to jointly align all points from a sequence of depth images non-rigidly. Subtle body deformation due to postural sway and respiration is successfully mitigated leading to a higher geometric accuracy through regularised locally affine transformations. We present results from 6 clinical cases where our method compares well with the gold standard and outperforms a previous approach. We show that our method produces better reconstructions qualitatively by visual assessment and quantitatively by consistently obtaining lower landmark error scores and yielding more accurate breast volume estimates.
http://arxiv.org/abs/1901.05377
We propose a new architecture that learns to attend to different Convolutional Neural Networks (CNN) layers (i.e., different levels of abstraction) and different spatial locations (i.e., specific layers within a given feature map) in a sequential manner to perform the task at hand. Specifically, at each Recurrent Neural Network (RNN) timestep, a CNN layer is selected and its output is processed by a spatial soft-attention mechanism. We refer to this architecture as the Unified Attention Network (UAN), since it combines the “what” and “where” aspects of attention, i.e., “what” level of abstraction to attend to, and “where” should the network look at. We demonstrate the effectiveness of this approach on two computer vision tasks: (i) image-based camera pose and orientation regression and (ii) indoor scene classification. We evaluate our method on standard benchmarks for camera localization (Cambridge, 7-Scene, and TUM-LSI datasets) and for scene classification (MIT-67 indoor dataset), and show that our method improves upon the results of previous methods. Empirically, we show that combining “what” and “where” aspects of attention improves network performance on both tasks.
http://arxiv.org/abs/1901.05376
Recent research on face detection, which is focused primarily on improving accuracy of detecting smaller faces, attempt to develop new anchor design strategies to facilitate increased overlap between anchor boxes and ground truth faces of smaller sizes. In this work, we approach the problem of small face detection with the motivation of enriching the feature maps using a density map estimation module. This module, inspired by recent crowd counting/density estimation techniques, performs the task of estimating the per pixel density of people/faces present in the image. Output of this module is employed to accentuate the feature maps from the backbone network using a feature enrichment module before being used for detecting smaller faces. The proposed approach can be used to complement recent anchor-design based novel methods to further improve their results. Experiments conducted on different datasets such as WIDER, FDDB and Pascal-Faces demonstrate the effectiveness of the proposed approach.
http://arxiv.org/abs/1901.05375
Bedside caregivers assess infants’ pain at constant intervals by observing specific behavioral and physiological signs of pain. This standard has two main limitations. The first limitation is the intermittent assessment of pain, which might lead to missing pain when the infants are left unattended. Second, it is inconsistent since it depends on the observer’s subjective judgment and differs between observers. The intermittent and inconsistent assessment can induce poor treatment and, therefore, cause serious long-term consequences. To mitigate these limitations, the current standard can be augmented by an automated system that monitors infants continuously and provides quantitative and consistent assessment of pain. Several automated methods have been introduced to assess infants’ pain automatically based on analysis of behavioral or physiological pain indicators. This paper comprehensively reviews the automated approaches (i.e., approaches to feature extraction) for analyzing infants’ pain and the current efforts in automatic pain recognition. In addition, it reviews the databases available to the research community and discusses the current limitations of the automated pain assessment.
http://arxiv.org/abs/1607.00331
Current benchmarks for optical flow algorithms evaluate the estimation quality by comparing their predicted flow field with the ground truth, and additionally may compare interpolated frames, based on these predictions, with the correct frames from the actual image sequences. For the latter comparisons, objective measures such as mean square errors are applied. However, for applications like image interpolation, the expected user’s quality of experience cannot be fully deduced from such simple quality measures. Therefore, we conducted a subjective quality assessment study by crowdsourcing for the interpolated images provided in one of the optical flow benchmarks, the Middlebury benchmark. We used paired comparisons with forced choice and reconstructed absolute quality scale values according to Thurstone’s model using the classical least squares method. The results give rise to a re-ranking of 141 participating algorithms w.r.t. visual quality of interpolated frames mostly based on optical flow estimation. Our re-ranking result shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks.
http://arxiv.org/abs/1901.05362
We present a simple lightweight markerless facial performance capture framework using just a monocular video input that combines Active Appearance Models for feature tracking and prior constraints on 3D shapes into an integrated objective function. 2D monocular inputs inherently lack information along the depth axis and can lead to physically implausible solutions. In order to address this loss of information, we enforce a constraint on our objective function within a probabilistic framework that uses preexisting animations obtained from accurate 3D tracking systems, thus achieving more plausible results. Our system fits a Blendshape model to tracked 2D features while also handling noise in estimation of features and camera parameters. We learn separate constraints for the upper and lower regions of the face thus maintaining flexibility. We show that using this approach, we can obtain significant improvement in tracking especially along the depth dimension. Our method uses easily obtainable prior animation data. We show that our method can generate convincing animations using only a monocular video input. We quantitatively evaluate our results comparing it with an approach using a monocular input without our spatial constraints and show that our results are closer to the ground-truth geometry. Finally, we also evaluate the effect that the choice of the Blendshape set has on the results of the solver by solving for a different set of Blendshapes and quantitatively comparing it with our previous results and to the ground truth. We show that while the choice of Blendshapes does make a difference, the use of our spatial constraints generates results that are closer to the ground truth.
http://arxiv.org/abs/1901.05355
Fabricating neural models for a wide range of mobile devices demands for a specific design of networks due to highly constrained resources. Both evolution algorithms (EA) and reinforced learning methods (RL) have been dedicated to solve neural architecture search problems. However, these combinations usually concentrate on a single objective such as the error rate of image classification. They also fail to harness the very benefits from both sides. In this paper, we present a new multi-objective oriented algorithm called MoreMNAS (Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search) by leveraging good virtues from both EA and RL. In particular, we incorporate a variant of multi-objective genetic algorithm NSGA-II, in which the search space is composed of various cells so that crossovers and mutations can be performed at the cell level. Moreover, reinforced control is mixed with a natural mutating process to regulate arbitrary mutation, maintaining a delicate balance between exploration and exploitation. Therefore, not only does our method prevent the searched models from degrading during the evolution process, but it also makes better use of learned knowledge. Our experiments conducted in Super-resolution domain (SR) deliver rivalling models compared to some state-of-the-art methods with fewer FLOPS.
https://arxiv.org/abs/1901.01074
Advances in hybrid organic/inorganic architectures for optoelectronics can be achieved by understanding how the atomic and electronic degrees of freedom cooperate or compete to yield the desired functional properties. Here we show how work-function changes are modulated by the structure of the organic components in model hybrid systems. We consider two cyano-quinodimethane derivatives (F4-TCNQ and F6-TCNNQ), which are strong electron-acceptor molecules, adsorbed on H-Si(111). From systematic structure searches employing range-separated hybrid HSE06 functional including many body van der Waals contributions, we predict that despite their similar composition, these molecules adsorb with significantly different densely-packed geometries in the first layer, due to strong intermolecular interaction. F6-TCNNQ shows a much stronger intralayer interaction (primarily due to van der Waals contributions) than F4-TCNQ in multilayered structures. The densely-packed geometries induce a large interface-charge rearrangement that result in a work-function increase of 1.11 and 1.76 eV for F4-TCNQ and F6-TCNNQ, respectively. Nuclear fluctuations at room temperature produce a wide distribution of work-function values, well modeled by a normal distribution with {\sigma}=0.17 eV. We corroborate our findings with experimental evidence of pronounced island formation for F6-TCNNQ on H-Si(111) and with the agreement of trends between predicted and measured work-function changes.
https://arxiv.org/abs/1811.00037
Sequential decision-making (SDM) plays a key role in intelligent robotics, and can be realized in very different ways, such as supervised learning, automated reasoning, and probabilistic planning. The three families of methods follow different assumptions and have different (dis)advantages. In this work, we aim at a robot SDM framework that exploits the complementary features of learning, reasoning, and planning. We utilize long short-term memory (LSTM), for passive state estimation with streaming sensor data, and commonsense reasoning and probabilistic planning (CORPP) for active information collection and task accomplishment. In experiments, a mobile robot is tasked with estimating human intentions using their motion trajectories, declarative contextual knowledge, and human-robot interaction (dialog-based and motion-based). Results suggest that our framework performs better than its no-learning and no-reasoning versions in a real-world office environment.
http://arxiv.org/abs/1901.05322
I assess the extent to which the recently introduced BERT model captures English syntactic phenomena, using (1) naturally-occurring subject-verb agreement stimuli; (2) “coloreless green ideas” subject-verb agreement stimuli, in which content words in natural sentences are randomly replaced with words sharing the same part-of-speech and inflection; and (3) manually crafted stimuli for subject-verb agreement and reflexive anaphora phenomena. The BERT model performs remarkably well on all cases.
http://arxiv.org/abs/1901.05287
Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence. End-to-end SRL without syntactic input has received great attention. However, most of them focus on either span-based or dependency-based semantic representation form and only show specific model optimization respectively. Meanwhile, handling these two SRL tasks uniformly was less successful. This paper presents an end-to-end model for both dependency and span SRL with a unified argument representation to deal with two different types of argument annotations in a uniform fashion. Furthermore, we jointly predict all predicates and arguments, especially including long-term ignored predicate identification subtask. Our single model achieves new state-of-the-art results on both span (CoNLL 2005, 2012) and dependency (CoNLL 2008, 2009) SRL benchmarks.
http://arxiv.org/abs/1901.05280
Appropriate modeling of a surveillance scene is essential for detection of anomalies in road traffic. Learning usual paths can provide valuable insight of road traffic conditions and thus can help in identifying abnormal routes taken by commuters/vehicles. If usual traffic paths are learned in a nonparametric way, manual interventions for marking roads can be avoided. We propose an unsupervised and nonparametric method to learn frequently used paths from the tracks of moving objects in $\Theta(kn)$ time, where $k$ is the number of paths and $n$ represents the number of tracks. In the proposed method, temporal dependencies of the moving objects are taken into consideration to make the clustering meaningful using Temporally Incremental Gravity Model (TIGM)-Dynamic State Model (DSM). In addition, the distance-based scene learning makes it realistically intuitive to estimate the model parameters. Experimental validation reveals that the proposed method can learn a scene quickly without prior knowledge about the number of paths ($k$). We have compared the results with state-of-the-art methods. We also highlight the advantages of the proposed method over these techniques, especially for traffic monitoring applications. Further, we extend the model to represent notable traffic dynamics of a scene, that can be used for administrative decision making to control traffic at junctions or crowded places. We have also applied the proposed model to understand its effectiveness in clustering network traffic data.
http://arxiv.org/abs/1803.06613
We present a detailed description and reference implementation of preprocessing steps necessary to prepare the public Retrospective Image Registration Evaluation (RIRE) dataset for the task of magnetic resonance imaging (MRI) to X-ray computed tomography (CT) translation. Furthermore we describe and implement three state of the art convolutional neural network (CNN) and generative adversarial network (GAN) models where we report statistics and visual results of two of them.
http://arxiv.org/abs/1901.05259
Current robot architectures for modeling interaction behavior are not well suited to the dual task of sequencing discrete actions and incorporating information instantly. Additionally, for communication based on body motion, actions also serve as cues for negotiating interaction alternatives and to enable timely interventions. The paper presents a dynamical system based on the stable heteroclinic channel network, which provides a rich set of parameters to isntantly modulate motions, while maintaining a compact state graph abstraction suitable for reasoning, planning and inference.
http://arxiv.org/abs/1901.05256
We study the quantum synchronization between a pair of two-level systems inside two coupled cavities. By using a digital-analog decomposition of the master equation that rules the system dynamics, we show that this approach leads to quantum synchronization between both two-level systems. Moreover, we can identify in this digital-analog block decomposition the fundamental elements of a quantum machine learning protocol, in which the agent and the environment (learning units) interact through a mediating system, namely, the register. If we can additionally equip this algorithm with a classical feedback mechanism, which consists of projective measurements in the register, reinitialization of the register state and local conditional operations on the agent and environment subspace, a powerful and flexible quantum machine learning protocol emerges. Indeed, numerical simulations show that this protocol enhances the synchronization process, even when every subsystem experience different loss/decoherence mechanisms, and give us the flexibility to choose the synchronization state. Finally, we propose an implementation based on current technologies in superconducting circuits.
http://arxiv.org/abs/1709.08519
Irradiation experiments (IE) are an essential step in the development of High-Energy Physics (HEP) particle accelerators and detectors. They assess the radiation hardness of materials used in HEP experimental devices by simulating, in a short time, the common long-term degradation effects due to their bombardment by high-energy particles. IEs are also used in other scientific and industrial fields such as medicine (e.g., for cancer treatment, medical imaging, etc.), space/avionics (e.g., for radiation testing of payload equipment) as well as in industry (e.g., for food sterilization). Usually carried out with ionizing radiation, these complex processes require highly specialized infrastructures: the irradiation facilities. Currently, hundreds of such facilities exist worldwide. To help develop best practices and promote computer-assisted handling and management of IEs, we introduce IEDM, a new OWL-based Irradiation Experiment Data Management ontology. This paper provides an overview of the classes and properties of IEDM. Since one of the key design choices for IEDM was to maximize the reuse of existing foundational ontologies such as the Ontology of Scientific Experiments (EXPO), the Ontology of Units of Measure (OM) and the Friend-of-a-Friend Ontology (FOAF), we discuss the methodological issues of the integration of IEDM with these imported ontologies. We illustrate the use of IEDM via an actual IE recently performed at IRRAD, the CERN proton irradiation facility. Finally, we discuss other motivations for this work, including the use of IEDM for the generation of user interfaces for IE management, and their impact on our methodology.
http://arxiv.org/abs/1901.05233
Sentence embedding is a significant research topic in the field of natural language processing (NLP). Generating sentence embedding vectors reflecting the intrinsic meaning of a sentence is a key factor to achieve an enhanced performance in various NLP tasks such as sentence classification and document summarization. Therefore, various sentence embedding models based on supervised and unsupervised learning have been proposed after the advent of researches regarding the distributed representation of words. They were evaluated through semantic textual similarity (STS) tasks, which measure the degree of semantic preservation of a sentence and neural network-based supervised embedding models generally yielded state-of-the-art performance. However, these models have a limitation in that they have multiple parameters to update, thereby requiring a tremendous amount of labeled training data. In this study, we propose an efficient approach that learns a transition matrix that refines a sentence embedding vector to reflect the latent semantic meaning of a sentence. The proposed method has two practical advantages; (1) it can be applied to any sentence embedding method, and (2) it can achieve robust performance in STS tasks irrespective of the number of training examples.
http://arxiv.org/abs/1901.05219
This paper extends the earlier work on an oscillating error correction technique. Specifically, it extends the design to include further corrections, by adding new layers to the classifier through a branching method. This technique is still consistent with earlier work and neural networks in general. With this extended design, the classifier can now achieve the high levels of accuracy reported previously. A second version then adds fixed value ranges through bands, for each column or feature of the input dataset. It is shown that some of the data can be correctly classified through using fixed value ranges only, while the rest can be classified by using the classifier technique. This can possibly present the classifier in terms of a biological model of neurons and neuron links.
http://arxiv.org/abs/1811.02617
Grid maps obtained from fused sensory information are nowadays among the most popular approaches for motion planning for autonomous driving cars. In this paper, we introduce Deep Grid Net (DGN), a deep learning (DL) system designed for understanding the context in which an autonomous car is driving. DGN incorporates a learned driving environment representation based on Occupancy Grids (OG) obtained from raw Lidar data and constructed on top of the Dempster-Shafer (DS) theory. The predicted driving context is further used for switching between different driving strategies implemented within EB robinos, Elektrobit’s Autonomous Driving (AD) software platform. Based on genetic algorithms (GAs), we also propose a neuroevolutionary approach for learning the tuning hyperparameters of DGN. The performance of the proposed deep network has been evaluated against similar competing driving context estimation classifiers.
http://arxiv.org/abs/1901.05203
3D object detection is still an open problem in autonomous driving scenes. When recognizing and localizing key objects from sparse 3D inputs, autonomous vehicles suffer from a larger continuous searching space and higher fore-background imbalance compared to image-based object detection. In this paper, we aim to solve this fore-background imbalance in 3D object detection. Inspired by the recent use of focal loss in image-based object detection, we extend this hard-mining improvement of binary cross entropy to point-cloud-based object detection and conduct experiments to show its performance based on two different 3D detectors: 3D-FCN and VoxelNet. The evaluation results show up to 11.2AP gains through the focal loss in a wide range of hyperparameters for 3D object detection.
https://arxiv.org/abs/1809.06065
Current state of the art solutions in the control of an autonomous vehicle mainly use supervised end-to-end learning, or decoupled perception, planning and action pipelines. Another possible solution is deep reinforcement learning, but such a method requires that the agent interacts with its surroundings in a simulated environment. In this paper we introduce GridSim, which is an autonomous driving simulator engine running a car-like robot architecture to generate occupancy grids from simulated sensors. We use GridSim to study the performance of two deep learning approaches, deep reinforcement learning and driving behavioral learning through genetic algorithms. The deep network encodes the desired behavior in a two elements fitness function describing a maximum travel distance and a maximum forward speed, bounded to a specific interval. The algorithms are evaluated on simulated highways, curved roads and inner-city scenarios, all including different driving limitations.
http://arxiv.org/abs/1901.05195
Human language, music and a variety of animal vocalisations constitute ways of sonic communication that exhibit remarkable structural complexity. While the complexities of language and possible parallels in animal communication have been discussed intensively, reflections on the complexity of music and animal song, and their comparisons are underrepresented. In some ways, music and animal songs are more comparable to each other than to language, as propositional semantics cannot be used as as indicator of communicative success or well-formedness, and notions of grammaticality are less easily defined. This review brings together accounts of the principles of structure building in language, music and animal song, relating them to the corresponding models in formal language theory, with a special focus on evaluating the benefits of using the Chomsky hierarchy (CH). We further discuss common misunderstandings and shortcomings concerning the CH, as well as extensions or augmentations of it that address some of these issues, and suggest ways to move beyond.
http://arxiv.org/abs/1901.05180
Graph matching is an important and persistent problem in computer vision and pattern recognition for finding node-to-node correspondence between graph-structured data. However, as widely used, graph matching that incorporates pairwise constraints can be formulated as a quadratic assignment problem (QAP), which is NP-complete and results in intrinsic computational difficulties. In this paper, we present a functional representation for graph matching (FRGM) that aims to provide more geometric insights on the problem and reduce the space and time complexities of corresponding algorithms. To achieve these goals, we represent a graph endowed with edge attributes by a linear function space equipped with a functional such as inner product or metric, that has an explicit geometric meaning. Consequently, the correspondence between graphs can be represented as a linear representation map of that functional. Specifically, we reformulate the linear functional representation map as a new parameterization for Euclidean graph matching, which is associative with geometric parameters for graphs under rigid or nonrigid deformations. This allows us to estimate the correspondence and geometric deformations simultaneously. The use of the representation of edge attributes rather than the affinity matrix enables us to reduce the space complexity by two orders of magnitudes. Furthermore, we propose an efficient optimization strategy with low time complexity to optimize the objective function. The experimental results on both synthetic and real-world datasets demonstrate that the proposed FRGM can achieve state-of-the-art performance.
http://arxiv.org/abs/1901.05179
We applied pre-defined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different general-purpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general filter convolutional neural network (GFNN), can reduce training time by 30% with a better accuracy compared to the regular convolutional neural network (CNN). GFNN also can be trained to achieve 90% accuracy with only 500 samples. Furthermore, even though these kernels are not specialized for the MNIST dataset, we achieved 99.56% accuracy without ensemble nor any other special algorithms.
http://arxiv.org/abs/1901.07375
Dynamic Programming Languages are quite popular because they increase the programmer’s productivity. However, the absence of types in the source code makes the program written in these languages difficult to understand and virtual machines that execute these programs cannot produced optimized code. To overcome this challenge, we develop a technique to predict types of all identifiers including variables, and function return types. We propose the first implementation of $2^{nd}$ order Inside Outside Recursive Neural Networks with two variants (i) Child-Sum Tree-LSTMs and (ii) N-ary RNNs that can handle large number of tree branching. We predict the types of all the identifiers given the Abstract Syntax Tree by performing just two passes over the tree, bottom-up and top-down, keeping both the content and context representation for all the nodes of the tree. This allows these representations to interact by combining different paths from the parent, siblings and children which is crucial for predicting types. Our best model achieves 44.33\% across 21 classes and top-3 accuracy of 71.5\% on our gathered Python data set from popular Python benchmarks.
http://arxiv.org/abs/1901.05138
Several deep supervised hashing techniques have been proposed to allow for efficiently querying large image databases. However, deep supervised image hashing techniques are developed, to a great extent, heuristically often leading to suboptimal results. Contrary to this, we propose an efficient deep supervised hashing algorithm that optimizes the learned codes using an information-theoretic measure, the Quadratic Mutual Information (QMI). The proposed method is adapted to the needs of large-scale hashing and information retrieval leading to a novel information-theoretic measure, the Quadratic Spherical Mutual Information (QSMI). Apart from demonstrating the effectiveness of the proposed method under different scenarios and outperforming existing state-of-the-art image hashing techniques, this paper provides a structured way to model the process of information retrieval and develop novel methods adapted to the needs of each application.
http://arxiv.org/abs/1901.05135
Neural style transfer has drawn considerable attention from both academic and industrial field. Although visual effect and efficiency have been significantly improved, existing methods are unable to coordinate spatial distribution of visual attention between the content image and stylized image, or render diverse level of detail via different brush strokes. In this paper, we tackle these limitations by developing an attention-aware multi-stroke style transfer model. We first propose to assemble self-attention mechanism into a style-agnostic reconstruction autoencoder framework, from which the attention map of a content image can be derived. By performing multi-scale style swap on content features and style features, we produce multiple feature maps reflecting different stroke patterns. A flexible fusion strategy is further presented to incorporate the salient characteristics from the attention map, which allows integrating multiple stroke patterns into different spatial regions of the output image harmoniously. We demonstrate the effectiveness of our method, as well as generate comparable stylized images with multiple stroke patterns against the state-of-the-art methods.
http://arxiv.org/abs/1901.05127
This paper presents a novel framework for predicting shot location and type in tennis. Inspired by recent neuroscience discoveries we incorporate neural memory modules to model the episodic and semantic memory components of a tennis player. We propose a Semi Supervised Generative Adversarial Network architecture that couples these memory models with the automatic feature learning power of deep neural networks and demonstrate methodologies for learning player level behavioural patterns with the proposed framework. We evaluate the effectiveness of the proposed model on tennis tracking data from the 2012 Australian Tennis open and exhibit applications of the proposed method in discovering how players adapt their style depending on the match context.
http://arxiv.org/abs/1901.05123
The sound field separation methods can separate the target field from the interfering noises, facilitating the study of the acoustic characteristics of the target source, which is placed in a noisy environment. However, most of the existing sound field separation methods are derived in the frequency-domain, thus are best suited for separating stationary sound fields. In this paper, a time-domain sound field separation method is developed that can separate the non-stationary sound field generated by the target source over a sphere in real-time. A spherical array sets up a boundary between the target source and the interfering sources, such that the outgoing field on the array is only generated by the target source. The proposed method decomposes the pressure and the radial particle velocity measured by the array into spherical harmonics coefficients, and recoveries the target outgoing field based on the time-domain relationship between the decomposition coefficients and the theoretically derived spatial filter responses. Simulations show the proposed method can separate non-stationary sound fields both in free field and room environments, and over a longer duration with small errors. The proposed method could serve as a foundation for developing future time-domain spatial sound field manipulation algorithms.
http://arxiv.org/abs/1901.05122
Predicting lead close rates is one of the most problematic tasks in the lead generation industry. In most cases, the only available data on the prospect is the self-reported information inputted by the user on the lead form and a few other data points publicly available through social media and search engine usage. All the major market niches for lead generation [1], such as insurance, health & medical and real estate, deal with life-altering decision making that no amount of data will be ever be able to describe or predict. This paper illustrates how character-level, deep long short-term memory networks can be applied to raw user inputs to help predict close rates. The output of the model is then used as an additional, highly predictive feature to significantly boost performance of lead scoring models.
http://arxiv.org/abs/1901.05115
Prevailing user authentication schemes on smartphones rely on explicit user interaction, where a user types in a passcode or presents a biometric cue such as face, fingerprint, or iris. In addition to being cumbersome and obtrusive to the users, such authentication mechanisms pose security and privacy concerns. Passive authentication systems can tackle these challenges by frequently and unobtrusively monitoring the user’s interaction with the device. In this paper, we propose a Siamese Long Short-Term Memory network architecture for passive authentication, where users can be verified without requiring any explicit authentication step. We acquired a dataset comprising of measurements from 30 smartphone sensor modalities for 37 users. We evaluate our approach on 8 dominant modalities, namely, keystroke dynamics, GPS location, accelerometer, gyroscope, magnetometer, linear accelerometer, gravity, and rotation sensors. Experimental results find that, within 3 seconds, a genuine user can be correctly verified 97.15% of the time at a false accept rate of 0.1%.
http://arxiv.org/abs/1901.05107
Predicting the motion of a driver’s vehicle is crucial for advanced driving systems, enabling detection of potential risks towards shared control between the driver and automation systems. In this paper, we propose a variational neural network approach that predicts future driver trajectory distributions for the vehicle based on multiple sensors. Our predictor generates both a conditional variational distribution of future trajectories, as well as a confidence estimate for different time horizons. Our approach allows us to handle inherently uncertain situations, and reason about information gain from each input, as well as combine our model with additional predictors, creating a mixture of experts. We show how to augment the variational predictor with a physics-based predictor, and based on their confidence estimators, improve overall system performance. The resulting combined model is aware of the uncertainty associated with its predictions, which can help the vehicle autonomy to make decisions with more confidence. The model is validated on real-world urban driving data collected in multiple locations. This validation demonstrates that our approach improves the prediction error of a physics-based model by 25% while successfully identifying the uncertain cases with 66% accuracy.
http://arxiv.org/abs/1901.05105
3D local feature extraction and matching is the basis for solving many tasks in the area of computer vision, such as 3D registration, modeling, recognition and retrieval. However, this process commonly draws into false correspondences, due to noise, limited features, occlusion, incomplete surface and etc. In order to estimate accurate transformation based on these corrupted correspondences, numerous transformation estimation techniques have been proposed. However, the merits, demerits and appropriate application for these methods are unclear owing to that no comprehensive evaluation for the performance of these methods has been conducted. This paper evaluates eleven state-of-the-art transformation estimation proposals on both descriptor based and synthetic correspondences. On descriptor based correspondences, several evaluation items (including the performance on different datasets, robustness to different overlap ratios and the performance of these technique combined with Iterative Closest Point (ICP), different local features and LRF/A techniques) of these methods are tested on four popular datasets acquired with different devices. On synthetic correspondences, the robustness of these methods to varying percentages of correct correspondences (PCC) is evaluated. In addition, we also evaluate the efficiencies of these methods. Finally, the merits, demerits and application guidance of these tested transformation estimation methods are summarized.
http://arxiv.org/abs/1901.05104
Cloud detection is an important preprocessing step for the precise application of optical satellite imagery. In this paper, we propose a deep learning based cloud detection method named multi-scale convolutional feature fusion (MSCFF) for remote sensing images of different sensors. In the network architecture of MSCFF, the symmetric encoder-decoder module, which provides both local and global context by densifying feature maps with trainable convolutional filter banks, is utilized to extract multi-scale and high-level spatial features. The feature maps of multiple scales are then up-sampled and concatenated, and a novel multi-scale feature fusion module is designed to fuse the features of different scales for the output. The two output feature maps of the network are cloud and cloud shadow maps, which are in turn fed to binary classifiers outside the model to obtain the final cloud and cloud shadow mask. The MSCFF method was validated on hundreds of globally distributed optical satellite images, with spatial resolutions ranging from 0.5 to 50 m, including Landsat-5/7/8, Gaofen-1/2/4, Sentinel-2, Ziyuan-3, CBERS-04, Huanjing-1, and collected high-resolution images exported from Google Earth. The experimental results show that MSCFF achieves a higher accuracy than the traditional rule-based cloud detection methods and the state-of-the-art deep learning models, especially in bright surface covered areas. The effectiveness of MSCFF means that it has great promise for the practical application of cloud detection for multiple types of medium and high-resolution remote sensing images. Our established global high-resolution cloud detection validation dataset has been made available online.
http://arxiv.org/abs/1810.05801
Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data. DeepSDF, like its classical counterpart, represents a shape’s surface by a continuous volumetric field: the magnitude of a point in the field represents the distance to the surface boundary and the sign indicates whether the region is inside (-) or outside (+) of the shape, hence our representation implicitly encodes a shape’s boundary as the zero-level-set of the learned function while explicitly representing the classification of space as being part of the shapes interior or not. While classical SDF’s both in analytical or discretized voxel form typically represent the surface of a single shape, DeepSDF can represent an entire class of shapes. Furthermore, we show state-of-the-art performance for learned 3D shape representation and completion while reducing the model size by an order of magnitude compared with previous work.
http://arxiv.org/abs/1901.05103
In autonomous vehicle (AV) control, allowing mistakes can be quite dangerous and costly in the real world. For this reason we investigate methods of training an AV without allowing the agent to explore and instead having a human explorer collect the data. Supervised learning has been explored for AV control, but it encounters the issue of the covariate shift. That is, training data collected from an optimal demonstration consists only of the states induced by the optimal control policy, but at runtime, the trained agent may encounter a vastly different state distribution with little relevant training data. To mitigate this issue, we have our human explorer make sub-optimal decisions. In order to have our agent not replicate these sub-optimal decisions, supervised learning requires that we either erase these actions, or replace these action with the correct action. Erasing is wasteful and replacing is difficult, since it is not easy to know the correct action without driving. We propose an alternate framework that includes continuous scalar feedback for each action, marking which actions we should replicate, which we should avoid, and how sure we are. Our framework learns continuous control from sub-optimal demonstration and evaluative feedback collected before training. We find that a human demonstrator can explore sub-optimal states in a safe manner, while still getting enough gradation to benefit learning. The collection method for data and feedback we call “Backseat Driver.” We call the more general learning framework ReNeg, since it learns a regression from states to actions given negative as well as positive examples. We empirically validate several models in the ReNeg framework, testing on lane-following with limited data. We find that the best solution is a generalization of mean-squared error and outperforms supervised learning on the positive examples alone.
http://arxiv.org/abs/1901.05101
In this work, we investigate the feasibility and effectiveness of employing deep learning algorithms for automatic recognition of the modulation type of received wireless communication signals from subsampled data. Recent work considered a GNU radio-based data set that mimics the imperfections in a real wireless channel and uses 10 different modulation types. A Convolutional Neural Network (CNN) architecture was then developed and shown to achieve performance that exceeds that of expert-based approaches. Here, we continue this line of work and investigate deep neural network architectures that deliver high classification accuracy. We identify three architectures - namely, a Convolutional Long Short-term Deep Neural Network (CLDNN), a Long Short-Term Memory neural network (LSTM), and a deep Residual Network (ResNet) - that lead to typical classification accuracy values around 90% at high SNR. We then study algorithms to reduce the training time by minimizing the size of the training data set, while incurring a minimal loss in classification accuracy. To this end, we demonstrate the performance of Principal Component Analysis in significantly reducing the training time, while maintaining good performance at low SNR. We also investigate subsampling techniques that further reduce the training time, and pave the way for online classification at high SNR. Finally, we identify representative SNR values for training each of the candidate architectures, and consequently, realize drastic reductions of the training time, with negligible loss in classification accuracy.
https://arxiv.org/abs/1901.05850
Language is an extremely interesting subject to study, each day presenting new challenges and new topics for research. Words in particular have several unique characteristics which when explored, prove to be astonishing. Anagrams and Antigrams are such words possessing these amazing properties. The presented work is an exploration into generating anagrams from a given word and determining whether there exists antigram relationships between the pairs of generated anagrams in light of the Word2Vec distributional semantic similarity model. The experiments conducted, showed promising results for detecting antigrams.
http://arxiv.org/abs/1901.05066
In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain.
http://arxiv.org/abs/1901.05061
Next generation of embedded Information and Communication Technology (ICT) systems are interconnected collaborative intelligent systems able to perform autonomous tasks. Training and deployment of such systems on Edge devices however require a fine-grained integration of data and tools to achieve high accuracy and overcome functional and non-functional requirements. In this work, we present a modular AI pipeline as an integrating framework to bring data, algorithms and deployment tools together. By these means, we are able to interconnect the different entities or stages of particular systems and provide an end-to-end development of AI products. We demonstrate the effectiveness of the AI pipeline by solving an Automatic Speech Recognition challenge and we show that all the steps leading to an end-to-end development for Key-word Spotting tasks: importing, partitioning and pre-processing of speech data, training of different neural network architectures and their deployment on heterogeneous embedded platforms.
http://arxiv.org/abs/1901.05049
A new approach to the tracking of sinusoidal chirps using linear programming is proposed. It is demonstrated that the classical algorithm of McAulay and Quatieri is greedy and exhibits exponential complexity for long searches, while approaches based on the Viterbi algorithm exhibit factorial complexity. A linear programming (LP) formulation to find the best $L$ paths in a lattice is described and its complexity is shown to be less than previous approaches. Finally it is demonstrated that the new LP formulation outperforms the classical algorithm in the tracking of sinusoidal chirps in high levels of noise.
http://arxiv.org/abs/1901.05044
Decanol droplets in a thin layer of sodium decanoate with sodium chloride exhibit bifurcation branching growth due to interplay between osmotic pressure, diffusion and surface tension. We aimed to evaluate if morphology of the branching droplets changes when the droplets are subject to electrical potential difference. We analysed graph-theoretic structure of the droplets and applied several complexity measures. We found that, in overall, the current increases complexity of the branching droplets in terms of number of connected components and nodes in their graph presentations, morphological complexity and compressibility.
http://arxiv.org/abs/1901.05043
We present CAM (comparative argumentative machine), a novel open-domain IR system to argumentatively compare objects with respect to information extracted from the Common Crawl. In a user study, the participants obtained 15% more accurate answers using CAM compared to a “traditional” keyword-based search and were 20% faster in finding the answer to comparative questions.
http://arxiv.org/abs/1901.05041
In this paper, we aim at solving the multi-domain image-to-image translation problem with a unified model in an unsupervised manner. The most successful work in this area refers to StarGAN, which works well in tasks like face attribute modulation. However, StarGAN is unable to match multiple translation mappings when encountering general translations with very diverse domain shifts. On the other hand, StarGAN adopts an Encoder-Decoder-Discriminator (EDD) architecture, where the model is time-consuming and unstable to train. To this end, we propose a Compact, effective, robust, and fast GAN model, termed CerfGAN, to solve the above problem. In principle, CerfGAN contains a novel component, i.e., a multi-class discriminator (MCD), which gives the model an extremely powerful ability to match multiple translation mappings. To stabilize the training process, MCD also plays a role of the encoder in CerfGAN, which saves a lot of computation and memory costs. We perform extensive experiments to verify the effectiveness of the proposed method. Quantitatively, CerfGAN is demonstrated to handle a serial of image-to-image translation tasks including style transfer, season transfer, face hallucination, etc, where the input images are sampled from diverse domains. The comparisons to several recently proposed approaches demonstrate the superiority and novelty of the proposed method.
http://arxiv.org/abs/1805.10871
We develop fast algorithms for solving the variational and game-theoretic $p$-Laplace equations on weighted graphs for $p>2$. The graph $p$-Laplacian for $p>2$ has been proposed recently as a replacement for the standard ($p=2$) graph Laplacian in semi-supervised learning problems with very few labels, where the minimizer of the graph Laplacian becomes degenerate. We present several efficient and scalable algorithms for both the variational and game-theoretic formulations, and present numerical results on synthetic data and on classification and regression problems that illustrate the effectiveness of the $p$-Laplacian for semi-supervised learning with few labels.
http://arxiv.org/abs/1901.05031